itmat / rum

RNA-Seq Unified Mapper
http://cbil.upenn.edu/RUM
MIT License
25 stars 4 forks source link

quals.fa and quality scores #145

Open nmanik opened 11 years ago

nmanik commented 11 years ago

I successfully ran a test run and everything seems to work fine!

But in the process, I noticed something which I'm not sure is critical or not (as I'm not clear on how RUM use quality scores for alignment/reporting) . The naming in an intermediate file outputdir/quals.fa has forward and reverse sequences mixed up (b,a order), compared to a,b order in outputdir/reads.fa:

Here is a sample, also showing input fastq files below with the correct qual scores..

$ head outputdir/quals.fa -n8
>seq.1b
#13./43544??9?;9:64:576<69599<@@@@@6????@@@@########################################################
>seq.1a
DDD4DCCD:DEBB3EDEEB?BC<C:ECBECD?EBBA@7,76?6=16A85A##################################################
>seq.2b
#8548::<7:DDDDD@DDDDD:6=;??75??????@=@7@DDDDDD@DD==????@@@@@?6???DDB@==<4>=@@@@@DD664>79=394949<<<<:
>seq.2a
HHHHDHHHHHHHHHGBFEEFFHHFHFHCFHHHHHHHHBDBFFEFDFFHHHHDEHHHHHHHFCDFBEEE>FCFEFC?D.?;6<6?=A>CCC##########

$ head outputdir/reads.fa -n8
>seq.1a
NGCGCGTCTTGTCTGCTGCAGCATCGTTCTGTGTTGTCTCTGTCTGACTGTGTTTCTGTATTTGTCTGAAAATATGGGCCAGACTGTTACCACTCCCTTA
>seq.1b
GGGTGATGAGGTCTCGGTTAAAGGTGCCGTCTCGCGGCCATCCGACGTTAAAGGGTGGCCATTCTGCAGAGCAGAAGGTAACCCAAAGTCTCTTCTTGAC
>seq.2a
NAGCAACATAGTGCCATTTGTTGGTGGGTATGGAACCATCTGAAGCAATCTCTCCAACTTCTAGGTCTAACGAGGACTTATTTGCAACAGTACAGAAGGG
>seq.2b
TTTTAATCTCTCACGAGTAGTCACTCTGACTCCCTTCTGTACTGTTGCAAATAAGTCCTCGTTAGACCTAGAAGTTGGAGAGATTGCTTCAGATGGTTCA

# Input fastq files
$ head fastq/test_R1.fastq -n8
@HWI-ST396:86:A819KWABXX:7:1:1352:1818 1:N:0:
NGCGCGTCTTGTCTGCTGCAGCATCGTTCTGTGTTGTCTCTGTCTGACTGTGTTTCTGTATTTGTCTGAAAATATGGGCCAGACTGTTACCACTCCCTTA
+
#13./43544??9?;9:64:576<69599<@@@@@6????@@@@########################################################
@HWI-ST396:86:A819KWABXX:7:1:1329:1831 1:N:0:
NAGCAACATAGTGCCATTTGTTGGTGGGTATGGAACCATCTGAAGCAATCTCTCCAACTTCTAGGTCTAACGAGGACTTATTTGCAACAGTACAGAAGGG
+
#8548::<7:DDDDD@DDDDD:6=;??75??????@=@7@DDDDDD@DD==????@@@@@?6???DDB@==<4>=@@@@@DD664>79=394949<<<<:

$ head fastq/test_R2.fastq -n8
@HWI-ST396:86:A819KWABXX:7:1:1352:1818 2:N:0:
GGGTGATGAGGTCTCGGTTAAAGGTGCCGTCTCGCGGCCATCCGACGTTAAAGGGTGGCCATTCTGCAGAGCAGAAGGTAACCCAAAGTCTCTTCTTGAC
+
DDD4DCCD:DEBB3EDEEB?BC<C:ECBECD?EBBA@7,76?6=16A85A##################################################
@HWI-ST396:86:A819KWABXX:7:1:1329:1831 2:N:0:
TTTTAATCTCTCACGAGTAGTCACTCTGACTCCCTTCTGTACTGTTGCAAATAAGTCCTCGTTAGACCTAGAAGTTGGAGAGATTGCTTCAGATGGTTCA
+
HHHHDHHHHHHHHHGBFEEFFHHFHFHCFHHHHHHHHBDBFFEFDFFHHHHDEHHHHHHHFCDFBEEE>FCFEFC?D.?;6<6?=A>CCC##########
mdelaurentis commented 11 years ago

That's odd.

I don't believe the quality scores are used in any way. RUM includes them in the final RUM.sam file, but I'm almost positive it doesn't actually do any calculations that use them. I'll check if this is resulting in mixed-up quality scores in the SAM file.