gpertea / stringtie

Transcript assembly and quantification for RNA-Seq
MIT License
365 stars 76 forks source link

StringTie is unable to predict known transcripts. #353

Open unique379r opened 2 years ago

unique379r commented 2 years ago

Hi I am trying to run StringTie for hifi isoseq reads (ccs fastq reads) by first running the deSALT aligner with '--trans-strand' option and then provide generated sort bam to StringTie (v2.2.1). The commands are as follow:

deSALT aln -o sample_desalt.sam -T -t 4 -x ccs hg38deSALT_index/ sample.hifi_reads.fastq

stringtie -p 4 -L -A sample_genes.txt -o sample_stringtie_counts.gtf -G gencode.v34.basic.annotation.gtf sample.desalt.sorted.bam

The gtf output generated by StringTie does not seem have known ENSEMB ID on second column, it has only "StringTie" which i suppose Novel predictions by StringTie. Can you please tell me whats wrong with it ?

## Counting the known Transcripts
grep -v '#' sample_stringtie_Out.gtf | awk '$2!="StringTie"' | wc -l

0

## OUT GTF

# StringTie version 2.2.1
chr1    StringTie   transcript  966476  975348  1000    -   .   gene_id "STRG.1"; transcript_id "STRG.1.1"; cov "2.624335"; FPKM "0.282870"; TPM "1.248060";
chr1    StringTie   exon    966476  966614  1000    -   .   gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "1"; cov "2.827338";
chr1    StringTie   exon    966704  966803  1000    -   .   gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "2"; cov "3.000000";
chr1    StringTie   exon    970277  970423  1000    -   .   gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "3"; cov "3.000000";
chr1    StringTie   exon    970521  970601  1000    -   .   gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "4"; cov "3.000000";
chr1    StringTie   exon    970686  971006  1000    -   .   gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "5"; cov "2.632399";
chr1    StringTie   exon    971077  971208  1000    -   .   gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "6"; cov "2.727273";
chr1    StringTie   exon    971324  971404  1000    -   .   gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "7"; cov "3.000000";

Note: By the way, the gene table generated by stringTie does have the known and novel genes.

Gene ID Gene Name   Reference   Strand  Start   End Coverage    FPKM    TPM
ENSG00000187961.14  KLHL17  chr1    +   960584  965719  1.809155    0.195004    0.860383
ENSG00000187583.11  PLEKHN1 chr1    +   966482  975865  0.854442    0.092098    0.406349
ENSG00000187642.9   PERM1   chr1    -   975204  982093  0.042559    0.004587    0.020240
STRG.1  -   chr1    -   966476  975348  3.128678    0.337232    1.487912
ENSG00000187608.10  ISG15   chr1    +   1001138 1014540 196.094193  21.136450   93.256905
ENSG00000231702.2   AL645608.3  chr1    -   1008076 1008229 0.000000    0.000000    0.000000
ENSG00000224969.1   AL645608.1  chr1    -   1011997 1013193 0.000000    0.000000    0.000000
STRG.2  -   chr1    -   1012952 1014540 216.082062  40.503719   178.707962
ENSG00000187634.12  SAMD11  chr1    +   925731  944581  22.574600   2.433254    10.735847

-best Rupesh Kesharwani