lh3 / bfc

High-performance error correction for Illumina resequencing data
MIT License
68 stars 13 forks source link

phred33 to 64 encoding change #11

Open dougwyu opened 9 years ago

dougwyu commented 9 years ago

Hi,

We have some illumina 1.5/phred64 files that we need to process. We used Trimmomatic's tophred33 option to convert to phred33 encoding (the idea being to do all downstream analysis in phred33). But after then running through bfc, some of the reads were converted back to phred64. Here is an example. The conversion back to phred64 happens in the reads that get "ec:Z:3" added to the info line.

original read

@FCC3PRVACXX:5:1101:5113:1970#AGCGCTAG/1 NACACCGGCACCCTTAAAATTCTACGGTATCGATTTCGGAGACAGCAGACTTTTGGAATTATACGTATTCCGTTTATATGTGTTCTGTAAGCAGTTTTAT + BPaceeeegggggiiiihiiiiiiiiiegiiifhiiiiihifghiiiihigggggeeeeedddddbdccedcbcccccedcceccdcddcbcccccdccc

post trimmomatic -tophred33, and leading N also trimmed

@FCC3PRVACXX:5:1101:5113:1970#AGCGCTAG/1 ACACCGGCACCCTTAAAATTCTACGGTATCGATTTCGGAGACAGCAGACTTTTGGAATTATACGTATTCCGTTTATATGTGTTCTGTAAGCAGTTTTAT + 1BDFFFFHHHHHJJJJIJJJJJJJJJFHJJJGIJJJJJIJGHIJJJJIJHHHHHFFFFFEEEEECEDDFEDCDDDDDFEDDFDDEDEEDCDDDDDEDDD

post bfc

@FCC3PRVACXX:5:1101:5113:1970#AGCGCTAG/1 ec:Z:3 ACACCGGCACCCTTAAAATTCTACGGTATCGATTTCGGAGACAGCAGACTTTTGGAATTATACGTATTCCGTTTATATGTGTTCTGTAAGCAGTTTTAT + Paceeeegggggiiiihiiiiiiiiiegiiifhiiiiihifghiiiihigggggeeeeedddddbdccedcbcccccedcceccdcddcbcccccdccc