Closed e-manduchi closed 11 years ago
The full SAM specification is located here:
http://samtools.sourceforge.net/SAM1.pdf
Briefly, the first set of columns in a SAM record are required and the 3rd column (RNAME) either contains the reference sequence (e.g. "chr1") if mapped or "*" if it is unmapped. This is the best place to filter for unmapped reads.
-angel
On Wednesday, October 17, 2012 at 4:11 PM, e-manduchi wrote:
Hello, I'm a bit confused about the RUM.sam file produced by the software. From https://github.com/PGFI/rum/wiki/RUM-output-files it says that IH:i:N means the read aligns to N locations. Now, looking at one of my RUM.sam files I noticed that it contains ALL reads from the fastq file (based on the read ids). However not all these reads map. I had assumed that for the unmapped ones I would see IH:i:0, instead there is not such occurrence. Why? What's the most reliable annotation in the RUM.sam which will tell me whether a read mapped or not? Thanks.
— Reply to this email directly or view it on GitHub (https://github.com/PGFI/rum/issues/146).
Thanks. Just found an older message from Greg saying that it was best to use the sixth column, the cigar string, if that's * then didn't align. Still, for these, shouldn't the IH:i be set to 0?
Actually, I would recommend using the second field (the FLAG field). According to the SAM file spec, "Bit 0x4 is the only reliable place to tell whether the segment is unmapped". So if you're using some program to process the SAM file, you should be able to do something like "$unmapped = $flag & 0x4" to find out of it is unmapped or not.
I don't remember why, but I believe there are some scenarios where the RNAME field can be something other than '*' and the read still didn't map. I think that may happen for paired reads where one maps but its mate doesn't.
On Wed, Oct 17, 2012 at 4:23 PM, e-manduchi notifications@github.comwrote:
Thanks. Just found an older message from Greg saying that it was best to use the sixth column, the cigar string, if that's * then didn't align. Still, for these, shouldn't the IH:i be set to 0?
— Reply to this email directly or view it on GitHubhttps://github.com/PGFI/rum/issues/146#issuecomment-9542357.
Thanks. and I now see that unmapped reads don't have the IH:i at all!
OK, I've updated the documentation to reference the SAM format specification and also an example of filtering using samtools and the bitwise FLAG.
Hello, I'm a bit confused about the RUM.sam file produced by the software. From https://github.com/PGFI/rum/wiki/RUM-output-files it says that IH:i:N means the read aligns to N locations. Now, looking at one of my RUM.sam files I noticed that it contains ALL reads from the fastq file (based on the read ids). However not all these reads map. I had assumed that for the unmapped ones I would see IH:i:0, instead there is not such occurrence. Why? What's the most reliable annotation in the RUM.sam which will tell me whether a read mapped or not? Thanks.