gpertea / stringtie

Transcript assembly and quantification for RNA-Seq
MIT License
377 stars 78 forks source link

TPM values do not add up to 1 million; coverage and TPM values do not add up. #293

Open harish3689 opened 4 years ago

harish3689 commented 4 years ago

3 transcripts were identified in a sample. Their TPM values add up to ~546,000 and not 1 million like I would expect.

stringtie -p 16 -j 10 -c 10 -s 10 -o output.stringtie.gtf input.Aligned.sortedByCoord.out.bam

StringTie version 2.0

gene_id "STRG.1"; transcript_id "STRG.1.1"; cov "417.328430"; FPKM "222464.998000"; TPM "376040.909959”; gene_id "STRG.2"; transcript_id "STRG.2.1"; cov "129.827484"; FPKM "69207.053467"; TPM "116983.271955”; gene_id "STRG.2"; transcript_id "STRG.2.2"; cov "59.716179"; FPKM "31832.865165"; TPM "53808.283060”;

The scaling factor given these coverages should be (417 + 130 + 60) / 1000000 = 0.000607, which means that TPM for first transcript should be 417 / 0.000607 = 686985?

mpertea commented 4 years ago

StringTie does not print all the transcripts that are identified at very low levels, because it assumes they are coming from transcriptional noise, or that they are poorly assembled. We think it is wrong to include all the transcription going on into the printed transcripts, as many of the reads overlapping the bundle might not even be compatible with the isoforms that were assembled.