Open krabapple opened 3 years ago
Hi @krabapple!
I am interested in the same problem. Did you solve it?
I am running into a similar issue, where areas with plenty of RNAseq coverage in IGV are not being properly included in the assembly, even when I make the command parameters hyper-sensitive (realistically, far too sensitive to be useful).
Here's an example where an exon that likely should be in the 5' UTR of the SERPING1 gene isn't being included:
This is obtained using the following command: stringtie -o output_stringtie.gtf -G hg38_genes.gtf -v -m 30 -a 5 -t -j 0.01 -c 0.01 -s 1 -M 1 -f 0 -g 0 -u -p 8 filename.bam
This is with stringtie version 2.2.1 The reads were mapped to hg38 with Hisat2 version 2.2.1, and specifically the reads mapping across the junction that is aberrantly not included in the stringtie output include a XS tag ("+" in this case).
Unfortunately, unguided predictions using stringtie have the same problem:
Were you able to solve this problem?
I follow up on this topic. I know is a couple of years late, but has anyone actually solved this?
I input an unpublished reference assembly to stringtie with the -G option . The reference gtf has no UTR data; the transcript boundaries in every case are coterminous with the 5' and 3' coordinates of the initial and terminal CDS, respectively. I was hoping stringtie could 'reveal' leading and trailing UTR candidate regions based on RNAseq read coverage, but this did not happen; in every case where stringtie assembled a transcript guided by a reference, the transcript boundary never extended beyond the reference boundaries -- even when there was plentiful RNASeq read coverage there (visible with a genome browser).
Is there a way to allow guided stringtie predictions to extend beyond the gtf boundaries?