itmat / rum

RNA-Seq Unified Mapper
http://cbil.upenn.edu/RUM
MIT License
26 stars 4 forks source link

Proper sam formating #180

Open choishingwan opened 10 years ago

choishingwan commented 10 years ago

Hi there, when using the SAM file generated from RUM, it was found that the tag for unique mapping (NH:i:1) is missing and there is no information on whether if the reads are uniquely aligned (except in that separated SAM file. Is it possible for RUM to provide these information in the final output instead of putting those information in a separated SAM file?

delagoya commented 10 years ago

Hi, thanks for submitting the issue. If you have a specific example of actual output and what is should look like, it would help us out to debug the issue.

choishingwan commented 10 years ago

One example will be as follow: The RUM output:

seq.1 83 2 125247861 25 36M1269N65M = 125247852 -1378 CGGGTCCGCGCGCGCTGCCGGCTACGACCTATTCAGTGCCTATGATTATACAATATCACCCATGGAGAAAGCCATCGTGAAGACAGACATTCAGATAGCTG @@BDDDDDBFD8F:A:EFH<:AFH;?FFIFFI>F@BB@CGE9::??FG???DB<4?B4BFCFFF84)7C=..:BEDBD69>>/<8?><<>9509@@9 XO:A:T MD:Z:101 NM:i:0 IH:i:1 HI:i:1 XS:A:+

Yet in order to let other software (e.g. MATS) know that the alignment is unique, we need the NH:i:1 flag like:

seq.1 83 2 125247861 25 36M1269N65M = 125247852 -1378 CGGGTCCGCGCGCGCTGCCGGCTACGACCTATTCAGTGCCTATGATTATACAATATCACCCATGGAGAAAGCCATCGTGAAGACAGACATTCAGATAGCTG @@BDDDDDBFD8F:A:EFH<:AFH;?FFIFFI>F@BB@CGE9::??FG???DB<4?B4BFCFFF84)7C=..:BEDBD69>>/<8?><<>9509@@9 XO:A:T MD:Z:101 NM:i:0 IH:i:1 HI:i:1 XS:A:+ NH:i:1

Also, for other programmes such as HTSeq, they required the information of multiple alignment, for that, I am not exactly sure how to tweak the alignment file from RUM to make it work. The only thing that I know is that although RUM reported that there were some multiple alignments, none of those were picked up by HTSeq. So it was suspected that some flag or tag might be missing to indicate that.

Thank you