Gffcompare and Stringtie FPKM estimation?

Hi there,

I have been trying to use the Stringtie-ballgown pipeline to conduct a differential expression analysis of the isoforms.

So here is the overview procedures I have taken:

Stringtie de novo + reference assembly of the 3 samples (sample 01, sample 02 and sample 03).
gffcompare to merge the 6 GTF generated in the step 1
use the gffcompare merged GTF file to conduct reference guided assembly using Stringtie to generate the file output for the ballgown.

stringtie /Users/apple/Desktop/Final_Project/2.\ Assemble\ transcriptome/Sp100/M11_combined_Sp100_stringtie/M11_combined_GrCH38_splitNC_000002.12_sorted.bam -L -G /Users/apple/Desktop/Final_Project/2.\ Assemble\ transcriptome/Sp100/All_gffcomparemerge/denovo+reference_modified/gffcompare_combine_all_modified.gtf -B -o M11_gffcompareall.gtf -f 0.001 -p 8 -A abundance_GrCHannotation.txt

Since the gffcompare merge are formed merging the isoforms that are identified by Stringtie assembly (in step 1), therefore, I would expect the reference guided assembly would be able to quantify the transcripts accordingly to their usage.

Indeed, the sample 02 and sample 03 reference guided Stringtie assembly worked and provided me with FPKM results of each isoforms that is similar to what I expected.

However, when I tried to look at the t_data.ctab file for the sample 01, it seems that apart from 3 main isoforms (which has a FPKM >100), the FPKM of the other isoforms become 0. It seems the the reference guided assembly failed to match the isoforms to the one that it produced previously. This means that it cannot quantify the reads that it generated..

I wonder what is the solution to the problem? May I know what is the reason for the Stringtie failed to recognise the isoforms itself? Thank you very much for helping!!!

gpertea / stringtie

Gffcompare and Stringtie FPKM estimation? #291