gpertea / stringtie

Transcript assembly and quantification for RNA-Seq
MIT License
383 stars 78 forks source link

Gffcompare and Stringtie FPKM estimation? #291

Open TommySchooner opened 4 years ago

TommySchooner commented 4 years ago

Hi there,

I have been trying to use the Stringtie-ballgown pipeline to conduct a differential expression analysis of the isoforms.

So here is the overview procedures I have taken:

  1. Stringtie de novo + reference assembly of the 3 samples (sample 01, sample 02 and sample 03).
  2. gffcompare to merge the 6 GTF generated in the step 1
  3. use the gffcompare merged GTF file to conduct reference guided assembly using Stringtie to generate the file output for the ballgown.

stringtie /Users/apple/Desktop/Final_Project/2.\ Assemble\ transcriptome/Sp100/M11_combined_Sp100_stringtie/M11_combined_GrCH38_splitNC_000002.12_sorted.bam -L -G /Users/apple/Desktop/Final_Project/2.\ Assemble\ transcriptome/Sp100/All_gffcomparemerge/denovo+reference_modified/gffcompare_combine_all_modified.gtf -B -o M11_gffcompareall.gtf -f 0.001 -p 8 -A abundance_GrCHannotation.txt

Since the gffcompare merge are formed merging the isoforms that are identified by Stringtie assembly (in step 1), therefore, I would expect the reference guided assembly would be able to quantify the transcripts accordingly to their usage.

Indeed, the sample 02 and sample 03 reference guided Stringtie assembly worked and provided me with FPKM results of each isoforms that is similar to what I expected.

However, when I tried to look at the t_data.ctab file for the sample 01, it seems that apart from 3 main isoforms (which has a FPKM >100), the FPKM of the other isoforms become 0. It seems the the reference guided assembly failed to match the isoforms to the one that it produced previously. This means that it cannot quantify the reads that it generated..

I wonder what is the solution to the problem? May I know what is the reason for the Stringtie failed to recognise the isoforms itself? Thank you very much for helping!!!

TommySchooner commented 4 years ago

Hello there,

Meanwhile I am doing the same pipeline with the other set of data in short reads. However, it seems that I am facing the same problem but however, I have no idea what that happens as well.

I wonder if it is possible for you to help me a bit since this is an important step in my analysis?

Thank you!

Best regards,