gpertea / stringtie

Transcript assembly and quantification for RNA-Seq
MIT License
365 stars 76 forks source link

Using the new --mix option for short and long-read data #336

Open niradsp opened 3 years ago

niradsp commented 3 years ago

I have previously been using FLAIR for the so-called "hybrid" method. That program allows short-read data to correct long-read data. I wanted to test stringtie's new --mix method. Here is how I am running it: stringtie -G --mix -o -p 10 -e

This will calculate the abundance.
Also, I am wondering how the TPM value is calculated. Is it based more on the long-read data or the short-read data? FLAIR I think is more "long read centric". In other words, the data is based on Nanopore's output. What about Stringtie? Is the TPM values more short-read base or long-read based?

Another question. How do I extract a tab-delimited TPM data? I am currently just parsing each GTF file and extracting the ENST ID along with the TPM value. Is this fine, or is there any other method? I notice that the -A option gives me a file at the gene level, not the isoform level. I am mainly interested in isoform usage.

With just the command above, I am getting far more isoforms than FLAIR, and the ones that I found significant in FLAIR I am finding them in stringtie's data as well. Please let me know if the command above looks good.

Thanks in advance, Nirad