I am working on implementing stringtie2 in a nextflow pipeline to produce extended annotations based on a user-provided reference annotation and long-read sequencing data. We use stringtie to assemble novel genes/transcripts from a BAM file that we then add to the annotation.
I am running into an issue when I'm using an annotation from RefSeq or EnsEMBL concerning overlapping reference genes with two distinct reference gene_id. It seems that stringtie removes one of the two genes in its output. This is the case for instance with "CHTF8" and "DERPC" (respectively RefSeq NM_001039690 and NM_001366606; and Gencode ENSG00000168802.14 and ENSG00000286140.2).
When I run stringtie using this annotation, I only have DERPC in the output GTF. Ideally, I would like to have both genes in the output of stringtie.
Hello @gpertea, thanks for developing stringtie!
I am working on implementing stringtie2 in a nextflow pipeline to produce extended annotations based on a user-provided reference annotation and long-read sequencing data. We use stringtie to assemble novel genes/transcripts from a BAM file that we then add to the annotation.
I am running into an issue when I'm using an annotation from RefSeq or EnsEMBL concerning overlapping reference genes with two distinct reference gene_id. It seems that stringtie removes one of the two genes in its output. This is the case for instance with "CHTF8" and "DERPC" (respectively RefSeq NM_001039690 and NM_001366606; and Gencode ENSG00000168802.14 and ENSG00000286140.2).
When I run stringtie using this annotation, I only have DERPC in the output GTF. Ideally, I would like to have both genes in the output of stringtie.
I am using v.2.2.3 using this command:
Do you have any advice or suggestions for this case ? Thank you very much !