pezmaster31 / bamtools

C++ API & command-line toolkit for working with BAM data
MIT License
418 stars 153 forks source link

Problem with the order of references #229

Closed johan-gson closed 1 year ago

johan-gson commented 1 year ago

I have a sorted BAM file that has the chromosomes in the same order as in the GTF file (chr1, chr2 etc.). However, when I run my code with the API, the references retrieved from the header come in the order chr1, chr10, etc., i.e., in alphabetical order. The RefID indices however still match that of the order in the GTF file, which messes everything up. I have run these commands prior to this operation, on a BAM file sorted by coordinate from STAR

samtools collate -o namecollate.bam Aligned.sortedByCoord.out.bam samtools fixmate -m namecollate.bam fixed.bam samtools sort -o positionsort.bam fixed.bam samtools markdup -t positionsort.bam WithDuplSt.bam

So, I think something has gotten seriously wrong, but I'm not sure exactly what. At some point, the references have been resorted in alphabetical order.

When I ran samtools view on the WithDuplSt.bam to convert it into a SAM file (don't remember the exact call) I got a file with the correct chromosome labeling of the reads. So this indicates that something goes wrong when the references are read somehow, but I am not sure, it is a bit strange. Maybe people don't use the reference names that much? Could you have a look?

johan-gson commented 1 year ago

Eh, I think I got this wrong after all :)