Open harish3689 opened 4 years ago
StringTie does not print all the transcripts that are identified at very low levels, because it assumes they are coming from transcriptional noise, or that they are poorly assembled. We think it is wrong to include all the transcription going on into the printed transcripts, as many of the reads overlapping the bundle might not even be compatible with the isoforms that were assembled.
3 transcripts were identified in a sample. Their TPM values add up to ~546,000 and not 1 million like I would expect.
stringtie -p 16 -j 10 -c 10 -s 10 -o output.stringtie.gtf input.Aligned.sortedByCoord.out.bam
StringTie version 2.0
gene_id "STRG.1"; transcript_id "STRG.1.1"; cov "417.328430"; FPKM "222464.998000"; TPM "376040.909959”; gene_id "STRG.2"; transcript_id "STRG.2.1"; cov "129.827484"; FPKM "69207.053467"; TPM "116983.271955”; gene_id "STRG.2"; transcript_id "STRG.2.2"; cov "59.716179"; FPKM "31832.865165"; TPM "53808.283060”;
The scaling factor given these coverages should be (417 + 130 + 60) / 1000000 = 0.000607, which means that TPM for first transcript should be 417 / 0.000607 = 686985?