Closed kethselly closed 2 years ago
@kethselly Yes, the SO
attribute usually tells the sorting method. And to double confirm whether a BAM is sorted by name, you also can run something like:
samtools view INPUT.bam|head -n50|cut -f1
This will print the read names of the first 50 entries in the BAM. If a BAM is sorted by names, you should see every two lines sharing the same read name. Otherwise, it is NOT sorted by name.
You can definitely resort the BAM and force name sorting by the samtools
command you suggested above, if needed.
I would stick with BAM
mode to make IRFinder result more consistent and comparable with your other results derived from the same BAM.
@dg520 Thanks so much for your reply! I tried the samtools view command and it does look like the .bam files that are output from STAR (on usegalaxy.org) are sorted by coordinate. I'll just try using the samtools sort command and then use them for IRFinder. Thanks again!
Hello,
I've been able to create a Reference correctly and was now focusing on quantifying intron retention. I previously mapped my RNA-seq reads using STAR on the main Galaxy servers and am trying to determine whether these files are coordinate sorted or unsorted. From what I can tell, it looks like these files are coordinate sorted since the header contains the following:
@HD VN:1.4 SO:coordinate
in the first line of the output in the .bam files on Galaxy. It seems like these output .bam files are not the correct input then for IRFinder. I had a couple questions I was hoping someone might be able to help with:
samtools sort -o name_sorted_output.bam -n input.bam
to resort these .bam files according to read names rather than coordinates?If the resorting by name isn't an option, I can use FASTQ mode of IRFinder, but I've already completed other analyses with the .bam files I have from Galaxy and wanted to use those for IRFinder as well.
Thanks so much for any help you might be able to provide.
~Seth