gpertea / stringtie

Transcript assembly and quantification for RNA-Seq
MIT License
385 stars 78 forks source link

Stringtie not identifying unannotated exon (annotated or de novo assembly) #384

Open dmworstell opened 2 years ago

dmworstell commented 2 years ago

I am running into a similar issue to other users where areas with plenty of RNAseq coverage in IGV are not being properly included in the assembly, even when I make the command parameters hyper-sensitive (realistically, far too sensitive to be useful).

Here's an example where an exon that likely should be in the 5' UTR of the SERPING1 gene isn't being included:

Screenshot 2022-11-08 at 4 54 25 PM

This is obtained using the following command: stringtie -o output_stringtie.gtf -G hg38_genes.gtf -v -m 30 -a 5 -t -j 0.01 -c 0.01 -s 1 -M 1 -f 0 -g 0 -u -p 8 filename.bam

This is with stringtie version 2.2.1 The reads were mapped to hg38 with Hisat2 version 2.2.1, and specifically the reads mapping across the junction that is aberrantly not included in the stringtie output include a XS tag ("+" in this case).

Unfortunately, unguided predictions using stringtie have the same problem:

image

Any help would be greatly appreciated.

rahil19 commented 1 year ago

I'm have a very similar issue. And an additional one. Issue1: UTR regions not being assembled despite good read depth and junction mapping in that region when used with -G option. In my reference gtf the UTR regions are not present in transcript and that they start and end at start and stop codons.

Issue2: When run with same set of parameters there are not assemblies generated on the right end of the genome. So it generates incomplete set of transcripts some of which are generated with -G option.

The stringtie codes I ran with StringTie version 2.2.1 de novo gtf

stringtie --rf -v -p2 -f 0.0 -m 100 -o f0_m100.gtf markdup.sorted.bam &> f0_m100.stringtie.out

With Reference Guided (-G option)

stringtie --rf -v -p2 -f 0.0 -m 100 -o f0_m100_G.gtf -G NC_001538_Gencode.gtf markdup.sorted.bam &> f0_m100_G.stringtie.out

Issue 2 Display In the figure I marked the result for de novo run and you can see on the right region of gtf display there's not transcript assembly

Stringtie_issue2

Issue 1 Display When run with -G option, compared to reference GTF shown below assmebly STRG 2.1 is the same as LTAg and STRG 2.3 sTAg, but assembly fall short at STOP codon and do not have UTR region, despite reads mapping both 5' and 3' regions of these transcript start and end.

STRG2.1

STRG2 1_assembly

STRG2.3

STRG2 3_assembly