Issue with --estimate-poly-a

nanoporetech / dorado

Oxford Nanopore's Basecaller

https://nanoporetech.com/

Other

538 stars 65 forks source link

Issue with --estimate-poly-a #1127

Closed Suchi-alt closed 2 weeks ago

Suchi-alt commented 2 weeks ago

Hi,

I’m encountering an issue with the pt:i tag in the BAM file output. Despite adding primers to the polyaconfig.toml file, the pt:i values don’t align as expected with the corresponding reads. I’ve tried adjusting parameters like the flank threshold and primer definitions, but the values still seem inconsistent.

I’m working with pod5 files and exploring ways to calculate polyA lengths. Would it make sense to write a Python or Bash script to calculate polyA lengths for each fastq file as a means to validate the pt:i values? Since I’m new to sequencing, I want to ensure I’m not overlooking something fundamental.

Any guidance would be greatly appreciated!

malton-ont commented 2 weeks ago

Hi @Suchi-alt,

What do you mean when you say:

the pt:i values don’t align as expected with the corresponding reads

Are you comparing these values to the actual basecalled sequence? The basecalled sequence within the polyA region is known to be inadequate, which is why dorado calculates the true polyA length in a different manner.

Suchi-alt commented 2 weeks ago

Yes @malton-ont , I’ve been working with the raw POD5 file input and was examining why the basecalled reads in the output seemed to differ. I adjusted the flank thresholds and the tail_interrupt_length, as the plasmid I’m working with has a homopolyA region of 120 bases. I wasn’t able to get any output using --estimate-poly-a, so I’m unsure how to validate this length as well.

malton-ont commented 2 weeks ago

@Suchi-alt,

If you're calling plasmids you should be setting plasmid_front_flank and plasmid_rear_flank rather than the primers. See the docs here.

Suchi-alt commented 2 weeks ago

So the sequence adjacent (at the 5' and 3' ends of polyA) will be taken as front and rear flank, and not the complementary of these adjacent sequences, right? This is what I understood from that page.

malton-ont commented 2 weeks ago

Yes, that is correct.

Suchi-alt commented 2 weeks ago

Thank you so much :)