gpertea / stringtie

Transcript assembly and quantification for RNA-Seq
MIT License
384 stars 78 forks source link

Error: could not locate transcript ENST00000472017.1 #451

Open Upreti-Anil opened 1 week ago

Upreti-Anil commented 1 week ago

Hello StringTie team,

I'm running into an issue where certain transcripts present in my BAM files are not appearing in the GTF output from StringTie. This causes errors when I try to generate transcript count matrices using prepDE.py, as it cannot locate these missing transcripts in some samples.

My Setup: I’m using StringTie to assemble transcripts and quantify expression from sorted BAM files, with the -G option pointing to a comprehensive GTF annotation file (from Gencode). For transcript quantification, I run StringTie with the parameters: stringtie -e $SORTED_BAM_FILE -o ${SAMPLE_NAME}.gtf -p $NUM_THREADS -G $GTF_FILE -A abundances.tab -C cov_refs.gtf -B

Error: could not locate transcript ENST00000697250.1 entry for sample OPL_B ## error from different run Error: could not locate transcript ENST00000607096.1 entry for sample CEXP_B ## error from different run

Are there specific StringTie parameters that would help ensure more consistent detection of transcripts across samples? Is there a recommended approach for cases where transcripts appear in BAM files but are missing in StringTie’s GTF output, especially for downstream differential expression analysis with prepDE.py?

Any insights or suggested settings would be much appreciated, as I’m aiming to achieve a comprehensive transcript count matrix compatible with DEseq2.

Thank you!