gpertea / stringtie

Transcript assembly and quantification for RNA-Seq
MIT License
365 stars 76 forks source link

Merge option gives different transcript numbers depending upon how the GTFs are provides #335

Closed sanyalab closed 3 years ago

sanyalab commented 3 years ago

Hello,

I encountered a curious issue while working with stringtie --merge option. Depending on how I provide my list of GTFs, I get a different number of transcripts. Please advice one the correct method of providing the list or if this is a bug.

If I provide the list of GTFs like the below command I get 89986 transcripts stringtie --merge -G HighConfAnnot.gff3 -o Merged_trans.gtf -f 0.1 EarlyLeaf.gtf Stem.gtf grep -w -v '#' Merged_trans.gtf | gawk '{if($3=="transcript"){print}}' | wc -l 89986

If I provide the list of GTFs as a file (One GTF entry per line, full path to GTF file provided) I get 104673 transcripts stringtie --merge -G HighConfAnnot.gff3 -o Merged_trans.gtf -f 0.1 gtf.list grep -w -v '#' Merged_trans.gtf | gawk '{if($3=="transcript"){print}}' | wc -l 104673

Please advice

Thanks Abhijit

gpertea commented 3 years ago

I could not reproduce this (with 4 GTF files and the current version, using -f 0.1 option). I assume there are only 2 filepaths in gtf.list pointing to the exact same 2 files in your test case ( EarlyLeaf.gtf and Stem.gtf)? They don't have to be full paths (can also be relative or just the file names if they are in the current directory).

Also, what version of stringtie is this? If it's the current version and you get this with the 2 file names in the gtf.list file, maybe you could share those 2 GTFs (and the HighConfAnnot.gff3) so I can reproduce the issue here?

sanyalab commented 3 years ago

Hello,

I cannot share these results as they are proprietary, but I can generate some with public data and send it across for review. I am using the latest 2.17 version.

Thanks Abhijit