Closed LeeBergstrand closed 1 month ago
Looks like the quality scores go down over the read length. Though some of the reads go back up.
@jmtsuji I ran stats on the assemblies of 32 genomes from one of our internal datasets. All these samples used older flowcell and base caller versions, so I had to trim them using Q score 8 (I tried many Q score cut-offs, and this had the best assemblies)and assemble them using flay nano-raw
mode. Here are the results:
There was not much difference between dropping the reads with a Q score less than X and end trimming the reads to a Q score of X. I assumed that you would get slightly more data into the assembly if you were end trimming but it does not appear to make much of a difference. @jmtsuji Thoughts?
From Mike Lynch:
End trimming makes sense in environments like Illumina and 454/pyrosequencing where enzyme decay puts Q-score recovery in question. It was my understanding that nanopore (and definitely PacBio) don’t have the same issue. That quality decay can easily recover (in some ways the Q-score of a set of nucleotide calls is at least partially independent of previous Q-scores - this obviously doesn’t hold if the Q-score decay is due to template issues).
So, keep average score trimming, which should be more representative of nanopore error profiles.
I feel we can close this.
For rule
nanopore_qc_filter
:Why are we setting
minavgquality
versus parameters such asqtrim
andtrimq
?@jmtsuji Would it make sense to
qtrim
the reads and then filter by length rather than dropping reads based onminavgquality
? Or are nanopore error profiles not conducive to end trimming?