gpertea / stringtie

Transcript assembly and quantification for RNA-Seq
MIT License
377 stars 78 forks source link

Error: Getting the input alignment file is not sorted! even though file is sorted #300

Closed venuraherath closed 4 years ago

venuraherath commented 4 years ago

Hi,

I am getting the following error when I try to use the Stringtie merge function stringtie --merge -p $threads -G DM_1-3_516_R44_potato.v6.1.working_models.gff3 -o stringtie_merged.gtf mergelist.txt.

Error: the input alignment file is not sorted!
Alignments (7) already found for scaffold_128 !

As mentioned in previous posts I checked the indexed bam files and they look fine. Please see below.

$ samtools view -H PVX_2d_1.bam | grep "HD"

Output

@HD VN:1.0 SO:coordinate

I checked the mergelist.txt and there are no empty spaces nor empty lines. There I have twelve samples. Before merge step I assembled transcripts using stringtie --rf -p $threads -G DM_1-3_516_R44_potato.v6.1.working_models.gff3 -o ${SAMPLE}.gtf -l ${SAMPLE} ${SAMPLE}.bam using indexed bam files.

Can you help me in this regard? Thanks in advance.

gpertea commented 4 years ago

Before anything else, can you please tell me what StringTie version you are running there? (stringtie --version should show that)

venuraherath commented 4 years ago

Before anything else, can you please tell me what StringTie version you are running there? (stringtie --version should show that)

My apologies. I am running the version StringTie/2.1.3 in ADA cluster.

gpertea commented 4 years ago

No apologies needed - it's good to know it's not a sorting issue that I knew an older version of stringtie had. I'm going to look into it, thank you for reporting this issue.

venuraherath commented 4 years ago

No apologies needed - it's good to know it's not a sorting issue that I knew an older version of stringtie had. I'm going to look into it, thank you for reporting this issue.

Awesome! Thank you!

gpertea commented 4 years ago

Would it be possible for you to package all the gtf files listed in mergelist.txt and the reference DM_1-3_516_R44_potato.v6.1.working_models.gff3 into a compressed file (zip or tar.gz) and share that file with me? (perhaps attach here if it's not too big) It would be easier for me if I worked directly with the data you had there already that triggered this bug, with the reference sequence (scaffold) names you have there.

venuraherath commented 4 years ago

Would it be possible for you to package all the gtf files listed in mergelist.txt and the reference DM_1-3_516_R44_potato.v6.1.working_models.gff3 into a compressed file (zip or tar.gz) and share that file with me? (perhaps attach here if it's not too big) It would be easier for me if I worked directly with the data you had there already that triggered this bug, with the reference sequence (scaffold) names you have there.

Yeah sure. Would you mind if I email you the link since the dataset is unpublished? Thank you for the understanding.

gpertea commented 4 years ago

Apologies for my memory lapse, the fix I had in mind earlier for an older version was actually exactly for v2.1.3 (and it was the main reason v2.1.3b was released), this commit addressed the bug you just encountered:

I strongly recommend upgrading to the latest available release (currently v2.1.4) which does include the patch for this issue and more fixes.