relipmoc / skewer

MIT License
95 stars 17 forks source link

quality encoding auto detection fails on PE files merged with SeqPrep #12

Open andreas-wilm opened 9 years ago

andreas-wilm commented 9 years ago

SeqPrep merges overlapping paired-end fastq files. It bumps up quality scores in overlapping scores accordingly (i.e. two agreeing bases with Q30 result in a Q60 base output). Quality encoding in SeqPrep output files is incorrectly recognised by Skewer as 'Solexa/Illumina 1.3+/Illumina 1.5+' instead of correctly 'Sanger/Illumina 1.8+'

Andreas

relipmoc commented 9 years ago

Could you paste the output files that were incorrectly recognised? Maybe the first several pairs of reads are enough. Thanks!

andreas-wilm commented 9 years ago

Here's an example (after processing with SeqMerge) which is correctly detected by FastQC but not skewer:

@M01853:160:000000000-ADF3H:1:1101:15279:1355 1:N:0:6
GTCCGTCCTCCTCCTCCCCCGTCTCCGCCCCCCGGCCCCGCGTCCTCCCTCGGGAGGGCGCGCGGGTCGGGGCGGCGGCCTGGGGGCGGGGAGCGGTCGGGCGGCGGCGGTCGGCGGGCTGTAGGCACCATCAAT
+
]]]]]]]]]]]S]SS]]]]]]]S]]T]]]]]S]]]]]]]]]]]]]]]]]]]]]]Q]]]]]]]]]]]U]]]]]]]]]]]]]]]]]]]]]]]]]]]]]9]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]

Andreas

relipmoc commented 9 years ago

Hi Andreas, According to http://en.wikipedia.org/wiki/FASTQ_format, the quality value could not be greater than 41. In this sense, Q60 is a non-standard scheme. However maybe we can support this scheme in the future.