I'm running into an issue where certain transcripts present in my BAM files are not appearing in the GTF output from StringTie. This causes errors when I try to generate transcript count matrices using prepDE.py, as it cannot locate these missing transcripts in some samples.
My Setup:
I’m using StringTie to assemble transcripts and quantify expression from sorted BAM files, with the -G option pointing to a comprehensive GTF annotation file (from Gencode).
For transcript quantification, I run StringTie with the parameters:
stringtie -e $SORTED_BAM_FILE -o ${SAMPLE_NAME}.gtf -p $NUM_THREADS -G $GTF_FILE -A abundances.tab -C cov_refs.gtf -B
Error: could not locate transcript ENST00000697250.1 entry for sample OPL_B ## error from different run
Error: could not locate transcript ENST00000607096.1 entry for sample CEXP_B ## error from different run
Are there specific StringTie parameters that would help ensure more consistent detection of transcripts across samples? Is there a recommended approach for cases where transcripts appear in BAM files but are missing in StringTie’s GTF output, especially for downstream differential expression analysis with prepDE.py?
Any insights or suggested settings would be much appreciated, as I’m aiming to achieve a comprehensive transcript count matrix compatible with DEseq2.
Hello StringTie team,
I'm running into an issue where certain transcripts present in my BAM files are not appearing in the GTF output from StringTie. This causes errors when I try to generate transcript count matrices using prepDE.py, as it cannot locate these missing transcripts in some samples.
My Setup: I’m using StringTie to assemble transcripts and quantify expression from sorted BAM files, with the -G option pointing to a comprehensive GTF annotation file (from Gencode). For transcript quantification, I run StringTie with the parameters: stringtie -e $SORTED_BAM_FILE -o ${SAMPLE_NAME}.gtf -p $NUM_THREADS -G $GTF_FILE -A abundances.tab -C cov_refs.gtf -B
Error: could not locate transcript ENST00000697250.1 entry for sample OPL_B ## error from different run Error: could not locate transcript ENST00000607096.1 entry for sample CEXP_B ## error from different run
Are there specific StringTie parameters that would help ensure more consistent detection of transcripts across samples? Is there a recommended approach for cases where transcripts appear in BAM files but are missing in StringTie’s GTF output, especially for downstream differential expression analysis with prepDE.py?
Any insights or suggested settings would be much appreciated, as I’m aiming to achieve a comprehensive transcript count matrix compatible with DEseq2.
Thank you!