gpertea / stringtie

Transcript assembly and quantification for RNA-Seq
MIT License
365 stars 76 forks source link

Large changes in FPKMs in the same genes and samples when using different annotation files in -e #263

Open Wylezils opened 4 years ago

Wylezils commented 4 years ago

Running Stringtie v2.1 produces up to 10-fold changes in FPKM values for the same samples in the same genes/locations when providing shortened versions of the same annotation file run in -e mode.

We were interested in looking only at gene expression on one particular chromosome as opposed to the whole genome but noticed that when providing a shortened reference GTF and running in -e mode the FPKM values rose significantly.

Can't seem to find any documentation for changes in FPKM values due to provided reference GTF file size. Could anyone shed some light on this issue? Seems surprising when reference files provided are completely independent from samples input yet can have a profound impact on resulting FPKM values.

Many Thanks for any help on this issue!

ZHIDIHUAYUAN commented 3 years ago

Hi, I have the same problem with you. When I used the full GTF file and a subset of the full GTF file, the results were different from each other. (1)the full GTF file: TCONS_00006749 0.000000 0.071092 128.187653 TCONS_00000678 91.014633 11.815572 13.136337 TCONS_00000679 15.020497 1.391409 5.266821 TCONS_00000680 10.670300 0.000000 0.000000 (2)a subset of GTF file: TCONS_00006749 0.000000 5354.114258 874459.000000 TCONS_00000678 779866.375000 889856.062500 89612.281250 TCONS_00000679 128704.359375 104790.015625 35928.726562 TCONS_00000680 91429.335938 0.000000 0.000000

what GTF should I used to calculate TPM?

Because I want to filter the transcript based on the TPM values,I first calculated the TPMs for known transcript to know the distribution of the TPMs, then, I calculated TPMs of the novel transcript to filter them.

Have you ever figured it out?