gymrek-lab / LongTR

Tandem repeat genotyping with long reads
GNU General Public License v2.0
19 stars 0 forks source link

Low base quality score #3

Closed wdecoster closed 4 months ago

wdecoster commented 4 months ago

Hi,

Most of our (ONT R9) reads overlapping a location get dropped because of low base quality score, and we are a bit confused about how to fix that. I naively first tried to put the --min-sum-qual to 10, then to 1 and 0, and even to -1000. Only the latter seemed to change something in the filtering, but still, for a few remaining loci, most reads were dropped. Did I misunderstand this, or could this be a bug?

Thanks, Wouter

heliziii commented 4 months ago

Dear Wouter,

Thank you for your interest in LongTR. The --min-sum-qual is the sum over log values of quality scores across all base pairs, so it is a negative number, and for longer reads, a large one. So please use a very large negative number (like -1e10) to keep all reads. I will change the default value for now, but we will work on coming up with a better measurement of quality for longer reads. Thank you for raising this issue.

Best, Helia

wdecoster commented 4 months ago

Ah! The documentation made me think it would be in the Phred scale. Thanks for the quick reply.