I have a sorted BAM file that has the chromosomes in the same order as in the GTF file (chr1, chr2 etc.). However, when I run my code with the API, the references retrieved from the header come in the order chr1, chr10, etc., i.e., in alphabetical order. The RefID indices however still match that of the order in the GTF file, which messes everything up. I have run these commands prior to this operation, on a BAM file sorted by coordinate from STAR
So, I think something has gotten seriously wrong, but I'm not sure exactly what. At some point, the references have been resorted in alphabetical order.
When I ran
samtools view on the WithDuplSt.bam to convert it into a SAM file (don't remember the exact call) I got a file with the correct chromosome labeling of the reads. So this indicates that something goes wrong when the references are read somehow, but I am not sure, it is a bit strange. Maybe people don't use the reference names that much? Could you have a look?
I have a sorted BAM file that has the chromosomes in the same order as in the GTF file (chr1, chr2 etc.). However, when I run my code with the API, the references retrieved from the header come in the order chr1, chr10, etc., i.e., in alphabetical order. The RefID indices however still match that of the order in the GTF file, which messes everything up. I have run these commands prior to this operation, on a BAM file sorted by coordinate from STAR
samtools collate -o namecollate.bam Aligned.sortedByCoord.out.bam samtools fixmate -m namecollate.bam fixed.bam samtools sort -o positionsort.bam fixed.bam samtools markdup -t positionsort.bam WithDuplSt.bam
So, I think something has gotten seriously wrong, but I'm not sure exactly what. At some point, the references have been resorted in alphabetical order.
When I ran samtools view on the WithDuplSt.bam to convert it into a SAM file (don't remember the exact call) I got a file with the correct chromosome labeling of the reads. So this indicates that something goes wrong when the references are read somehow, but I am not sure, it is a bit strange. Maybe people don't use the reference names that much? Could you have a look?