rrwick / Porechop

adapter trimmer for Oxford Nanopore reads
GNU General Public License v3.0
323 stars 124 forks source link

FASTQ output qual encodings are all "+" #3

Closed nextgenusfs closed 6 years ago

nextgenusfs commented 7 years ago

When working with FASTQ, seems like the quality scores are not being retained, i.e. Here is original read:

@channel_37_d6b322e5-b47b-4400-b152-ab15b84b12ce_template /Library/MinKNOW/data/reads/fail/0/cfmrs_mac_pro_fpl_fs_fed_us_20170307_FNFAF13190_MN18073_mux_scan_SWJ_Anid_54932_ch37_read188_strand.fast5
TCTGGTGTGCTTCGTTCAGTTACGTATTGTAAGGTTAACACAAGACGCCGTGGCTTTCTTCAGCACCTGGTCTCTCCGACCAGGTCTTTAATGTGGTGTTCATTAATTCTCCGATCGTCTTGGTCCATGGTGCATGCATCGCGTGCTGGGATCTTTGCCCTTCTCGCAAACGCAAATCCGCACTGATCTCGACAAGCATCATCAACAGCCAGCGGACTATAAGCCACAGCGGCATGCAGCGCAGTCTCCCATCTCCGTTGGCCGTAATGTTGAAACAACTCGGCTGACCCTCGGATGATTCGCTAACCAGCGGTCCCCTTCATGCCAAGATGCACGGCTATCCTCTTCTCGTCCTGGGGTCAGCATCTGCCTCCTTCGTCGCCGCTACCGGCGGACATCTGCATCCGTTCTCTGCGGTTGCGAGGACCGCGGGTGCGCGTTGGTGGCTCGCGAAGGGCGCGCTGATCATCCTGGCCGCGGTGGGCCATGCGGGATGGAGATCTGCGCGCGCACGCGCGTACGCGCTA
+
##$'('$.(.101+3/0)'++&(&*$%(#%'&,-1+-**#$$*)*1%&'&*+)'*.-/3.-%'')-++,$&,.,(/.*'%'$%'&#*+).02/1.)(&)(%&$)$$'),,,1/112-*)%$%*()+&$#)*&&$''*)*3-,0/,./01,&$'&*,*,2+*,--0('/1..#$$$()+-*,(+,1/2+(-&+2++)+**))(.(,1%+0((&.&)+**)+,'(.2)/*'+'),+1-*))*$(&%(*(*3)-',*2/)*.%''((&%$+*.2.,+-))&$&(#$##%&25++-+1/,*(-1*))*+0,/(*+,*)..-)'+*.)$*)(&*()%+))*,,(%(00--0/,,/.)+%$%/1-,-.'0+*)#%./*0./31/4,(+./11)..,),(.*)(*(&&&$*31/-+,-.*-/*/1/0(&#'(&%&&&'&&*++,(('&)))&&'&'&%($&'**1+,)%%+)(*))))-0.1.((*&((+'')#$%)',)'+(.'**++'+&')((&$#%$&%%###&$$#%"#

And here is "trimmed" output read.

@channel_37_d6b322e5-b47b-4400-b152-ab15b84b12ce_template /Library/MinKNOW/data/reads/fail/0/cfmrs_mac_pro_fpl_fs_fed_us_20170307_FNFAF13190_MN18073_mux_scan_SWJ_Anid_54932_ch37_read188_strand.fast5
GTCTCTCCGACCAGGTCTTTAATGTGGTGTTCATTAATTCTCCGATCGTCTTGGTCCATGGTGCATGCATCGCGTGCTGGGATCTTTGCCCTTCTCGCAAACGCAAATCCGCACTGATCTCGACAAGCATCATCAACAGCCAGCGGACTATAAGCCACAGCGGCATGCAGCGCAGTCTCCCATCTCCGTTGGCCGTAATGTTGAAACAACTCGGCTGACCCTCGGATGATTCGCTAACCAGCGGTCCCCTTCATGCCAAGATGCACGGCTATCCTCTTCTCGTCCTGGGGTCAGCATCTGCCTCCTTCGTCGCCGCTACCGGCGGACATCTGCATCCGTTCTCTGCGGTTGCGAGGACCGCGGGTGCGCGTTGGTGGCTCGCGAAGGGCGCGCTGATCATCCTGGCCGCGGTGGGCCATGCGGGATGGAGATCTGCGCGCGCACGCGCGTACGCGCTA
+
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
rrwick commented 7 years ago

Hmmm, this one is mysterious to me. I made + the default quality score (Phred of 10, i.e. 90% accurate). But it should only appear if your input is FASTA and your output is FASTQ, which is an odd scenario. If both your input and output are FASTQ (much more typical), then that shouldn't happen, so I suspect you've found a bug.

I can't reproduce the problem myself. If I make a broken FASTQ with empty quality lines, then I get the ++++++++, but not otherwise. Even your example read works. If I use --adapter_threshold 75 (to ensure this one read is sufficient for finding the NB01 adapter) then I get this:

@channel_37_d6b322e5-b47b-4400-b152-ab15b84b12ce_template /Library/MinKNOW/data/reads/fail/0/cfmrs_mac_pro_fpl_fs_fed_us_20170307_FNFAF13190_MN18073_mux_scan_SWJ_Anid_54932_ch37_read188_strand.fast5
GTCTCTCCGACCAGGTCTTTAATGTGGTGTTCATTAATTCTCCGATCGTCTTGGTCCATGGTGCATGCATCGCGTGCTGGGATCTTTGCCCTTCTCGCAAACGCAAATCCGCACTGATCTCGACAAGCATCATCAACAGCCAGCGGACTATAAGCCACAGCGGCATGCAGCGCAGTCTCCCATCTCCGTTGGCCGTAATGTTGAAACAACTCGGCTGACCCTCGGATGATTCGCTAACCAGCGGTCCCCTTCATGCCAAGATGCACGGCTATCCTCTTCTCGTCCTGGGGTCAGCATCTGCCTCCTTCGTCGCCGCTACCGGCGGACATCTGCATCCGTTCTCTGCGGTTGCGAGGACCGCGGGTGCGCGTTGGTGGCTCGCGAAGGGCGCGCTGATCATCCTGGCCGCGGTGGGCCATGCGGGATGGAGATCTGCGCGCGCACGCGCGTACGCGCTA
+
$&,.,(/.*'%'$%'&#*+).02/1.)(&)(%&$)$$'),,,1/112-*)%$%*()+&$#)*&&$''*)*3-,0/,./01,&$'&*,*,2+*,--0('/1..#$$$()+-*,(+,1/2+(-&+2++)+**))(.(,1%+0((&.&)+**)+,'(.2)/*'+'),+1-*))*$(&%(*(*3)-',*2/)*.%''((&%$+*.2.,+-))&$&(#$##%&25++-+1/,*(-1*))*+0,/(*+,*)..-)'+*.)$*)(&*()%+))*,,(%(00--0/,,/.)+%$%/1-,-.'0+*)#%./*0./31/4,(+./11)..,),(.*)(*(&&&$*31/-+,-.*-/*/1/0(&#'(&%&&&'&&*++,(('&)))&&'&'&%($&'**1+,)%%+)(*))))-0.1.((*&((+'')#$%)',)'+(.'**++'+&')((&$#%$&%%###&$$#%"#

Perhaps Porechop is making a mistake when parsing your FASTQ? Does the file contain strange line ending characters or something? If there are no data privacy issues with doing so, could you share the reads with me? Or perhaps a subset of the reads which has the same problem?

Thanks, Ryan

rrwick commented 6 years ago

This is an old issue, so I'm going to close it now. But please let me know if you're still experiencing unresolved issues!

Ryan