gpertea / stringtie

Transcript assembly and quantification for RNA-Seq
MIT License
373 stars 78 forks source link

the relation of gene id and its derived transcripts id in FPKM ? #172

Open brightbio opened 6 years ago

brightbio commented 6 years ago

dear author, I am trying to analyse differential gene expression of zea mays with well annotated genome following the simplified protocol has only 3 steps (depicted below). image and met such an issue: in the same campared group, gene id was significantly down-regulated, however, some of its derived transcripts id were significantly down-regulated, up-reglated or unchanged. like this: qq 20180403101140 now, i am wondering the relation of gene id and its derived transcripts id in FPKM. looking forward to your reply.

brightbio commented 6 years ago

@gpertea looking forward to your reply. thanks a lot.

gpertea commented 6 years ago

Isn't it possible that the overall change for the gene to be down-regulation, even though one of the transcripts may be significantly up-regulated (if the others are down regulated)? It's hard to tell what's in your data there and I do not have detailed knowledge about the inner workings of those significance tests, but I guess it is possible to have rare situations like this, where a minority of transcripts may seem up-regulated between samples even though the overall expression level of the whole gene is decreased.

brightbio commented 6 years ago

@gpertea thanks for your kindly reply. maybe i did not make my issue clear. actually, i want to know whether the FPKM of gene id is equal to ∑ ( FPKM of transcripts id from the same gene id ) or the FPKM of gene id is equal to FPKM of the longest transcripts id from the same gene id ? is there other relation between FPKMs of gene id and its derived transcripts id ? looking forward to your reply. thanks a lot.

gpertea commented 6 years ago

It is generally the sum, but there could be reads that are not placed by StringTie into any of the assembled transcripts so they won't contribute to the coverage of the (reported) transcript, yet they could contribute to the coverage of the gene -- because they are mapped to the locus there, just not assembled into transcripts. But those should be a minority (unless it's a very noisy region, or the mappings were incorrect/problematic). Overall the gene coverage is the sum of the component transcripts' coverage, though the numbers might not be exactly matching..