I am using StringTie to assemble novel transcripts in a very small non model genome. The genome has very few introns and mostly single exon genes. I was following the basic reference guided StringTie workflow and have assembled transcripts using a dataset of around 60 paired-end & strand specific RNA-seq samples (50bp) with various reads counts and conditons. In an initial run I looked for assembled transcripts I deemed novel. However, after including more data in the assembly process and merging the assembled files, many transcripts that were predicted prior to the inclusion were lost.
I am running stringtie with default options:
stringtie -rf -g 5 -G reference.gtf -o /out.gtf -p 10 -l labels in.bam
followed with stringtie --merge:
stringtie --merge -p 5 -G reference.gtf -o stringtie_merged.gtf mergelist.txt
I then extracted novel assembled transcripts that were not contained in the reference annotationy. (MSTRG.31 in this example)
I then included two additonal datasets, also consisting of paired-end strand specific data (150bp) with various reads counts and conditons. I ran stringtie-merge for each separate dataset (MSTRG.31 does not pop up at its coorinates for the two additional datasets as I expected due to different sample conditions etc.):
I expected, that including all assembled GTFs in my mergelist.txt file and executing stringtie-merge would result in a merged annotation including MSTR.31 from my initial run as well as for example MSTRG.32 & MSTRG.33 from the additional dataset 2. However in the final merged GTF they are all missiang at the location:
We are a bit puzzled by this behaviour. Could you expain how stringtie --merge includes or excludes assembled transcripts and what could have happend here?
Hey,
I have a question regarding StringTie --merge:
I am using StringTie to assemble novel transcripts in a very small non model genome. The genome has very few introns and mostly single exon genes. I was following the basic reference guided StringTie workflow and have assembled transcripts using a dataset of around 60 paired-end & strand specific RNA-seq samples (50bp) with various reads counts and conditons. In an initial run I looked for assembled transcripts I deemed novel. However, after including more data in the assembly process and merging the assembled files, many transcripts that were predicted prior to the inclusion were lost.
I am running stringtie with default options:
stringtie -rf -g 5 -G reference.gtf -o /out.gtf -p 10 -l labels in.bam
followed with stringtie --merge:stringtie --merge -p 5 -G reference.gtf -o stringtie_merged.gtf mergelist.txt
I then extracted novel assembled transcripts that were not contained in the reference annotationy. (MSTRG.31 in this example)
I then included two additonal datasets, also consisting of paired-end strand specific data (150bp) with various reads counts and conditons. I ran
stringtie-merge
for each separate dataset (MSTRG.31 does not pop up at its coorinates for the two additional datasets as I expected due to different sample conditions etc.):Additional dataset 1:
Additional dataset 2 shows two other novel transcripts (MSTRG.32 & MSTRG.33):
I expected, that including all assembled GTFs in my mergelist.txt file and executing
stringtie-merge
would result in a merged annotation including MSTR.31 from my initial run as well as for example MSTRG.32 & MSTRG.33 from the additional dataset 2. However in the final merged GTF they are all missiang at the location:We are a bit puzzled by this behaviour. Could you expain how
stringtie --merge
includes or excludes assembled transcripts and what could have happend here?Thank you very much!