Sorted vs. unsorted input files for IR quantification

kethselly commented 2 years ago

Hello,

I've been able to create a Reference correctly and was now focusing on quantifying intron retention. I previously mapped my RNA-seq reads using STAR on the main Galaxy servers and am trying to determine whether these files are coordinate sorted or unsorted. From what I can tell, it looks like these files are coordinate sorted since the header contains the following:

@HD VN:1.4 SO:coordinate

in the first line of the output in the .bam files on Galaxy. It seems like these output .bam files are not the correct input then for IRFinder. I had a couple questions I was hoping someone might be able to help with:

Is there an easy way to check that these .bam files are coordinate sorted? (or is that what this header line is telling me?)
Can I use a tool like samtools sort -o name_sorted_output.bam -n input.bam to resort these .bam files according to read names rather than coordinates?

If the resorting by name isn't an option, I can use FASTQ mode of IRFinder, but I've already completed other analyses with the .bam files I have from Galaxy and wanted to use those for IRFinder as well.

Thanks so much for any help you might be able to provide.
~Seth

dg520 commented 2 years ago

@kethselly Yes, the SO attribute usually tells the sorting method. And to double confirm whether a BAM is sorted by name, you also can run something like:

samtools view INPUT.bam|head -n50|cut -f1

This will print the read names of the first 50 entries in the BAM. If a BAM is sorted by names, you should see every two lines sharing the same read name. Otherwise, it is NOT sorted by name. You can definitely resort the BAM and force name sorting by the samtools command you suggested above, if needed. I would stick with BAM mode to make IRFinder result more consistent and comparable with your other results derived from the same BAM.

kethselly commented 2 years ago

@dg520 Thanks so much for your reply! I tried the samtools view command and it does look like the .bam files that are output from STAR (on usegalaxy.org) are sorted by coordinate. I'll just try using the samtools sort command and then use them for IRFinder. Thanks again!

williamritchie / IRFinder

Sorted vs. unsorted input files for IR quantification #158