Open dantaki opened 4 years ago
stringtie --merge stringtie_gtf.list -G Mus_musculus.GRCm38.84.gtf -o stringtie_merged.gtf
I ran this command using HiSAT2 aligned RNAseq data from mouse and I noticed that many genes in the reference GTF have more than one ref_gene_name
For example:
grep "MSTRG.13821" stringtie_merged.gtf | grep "transcript\t" -P 19 StringTie transcript 4907229 4928016 1000 - . gene_id "MSTRG.13821"; transcript_id "MSTRG.13821.1"; 19 StringTie transcript 4907229 4928287 1000 - . gene_id "MSTRG.13821"; transcript_id "MSTRG.13821.2"; 19 StringTie transcript 4907229 4928287 1000 - . gene_id "MSTRG.13821"; transcript_id "ENSMUST00000025851"; gene_name "Dpp3"; ref_gene_id "ENSMUSG00000063904"; 19 StringTie transcript 4907232 4923318 1000 - . gene_id "MSTRG.13821"; transcript_id "MSTRG.13821.4"; 19 StringTie transcript 4907233 4943127 1000 - . gene_id "MSTRG.13821"; transcript_id "MSTRG.13821.5"; 19 StringTie transcript 4929746 4943127 1000 - . gene_id "MSTRG.13821"; transcript_id "MSTRG.13821.6"; 19 StringTie transcript 4930651 4943127 1000 - . gene_id "MSTRG.13821"; transcript_id "ENSMUST00000120475"; gene_name "Peli3"; ref_gene_id "ENSMUSG00000024901"; 19 StringTie transcript 4931856 4943127 1000 - . gene_id "MSTRG.13821"; transcript_id "ENSMUST00000025834"; gene_name "Peli3"; ref_gene_id "ENSMUSG00000024901"; 19 StringTie transcript 4934986 4938628 1000 - . gene_id "MSTRG.13821"; transcript_id "ENSMUST00000139436"; gene_name "Peli3"; ref_gene_id "ENSMUSG00000024901"; 19 StringTie transcript 4935012 4943127 1000 - . gene_id "MSTRG.13821"; transcript_id "ENSMUST00000146289"; gene_name "Peli3"; ref_gene_id "ENSMUSG00000024901"; 19 StringTie transcript 4935016 4943127 1000 - . gene_id "MSTRG.13821"; transcript_id "ENSMUST00000133254"; gene_name "Peli3"; ref_gene_id "ENSMUSG00000024901"; 19 StringTie transcript 4935023 4943127 1000 - . gene_id "MSTRG.13821"; transcript_id "ENSMUST00000133504"; gene_name "Peli3"; ref_gene_id "ENSMUSG00000024901"; 19 StringTie transcript 4941600 4943127 1000 - . gene_id "MSTRG.13821"; transcript_id "ENSMUST00000143930"; gene_name "Peli3"; ref_gene_id "ENSMUSG00000024901"; grep "Dpp3" founder_stringtie_merged.gtf | grep "transcript\t" -P 19 StringTie transcript 4907229 4928287 1000 - . gene_id "MSTRG.13821"; transcript_id "ENSMUST00000025851"; gene_name "Dpp3"; ref_gene_id "ENSMUSG00000063904"; grep "Peli3" founder_stringtie_merged.gtf | grep "transcript\t" -P 19 StringTie transcript 4930651 4943127 1000 - . gene_id "MSTRG.13821"; transcript_id "ENSMUST00000120475"; gene_name "Peli3"; ref_gene_id "ENSMUSG00000024901"; 19 StringTie transcript 4931856 4943127 1000 - . gene_id "MSTRG.13821"; transcript_id "ENSMUST00000025834"; gene_name "Peli3"; ref_gene_id "ENSMUSG00000024901"; 19 StringTie transcript 4934986 4938628 1000 - . gene_id "MSTRG.13821"; transcript_id "ENSMUST00000139436"; gene_name "Peli3"; ref_gene_id "ENSMUSG00000024901"; 19 StringTie transcript 4935012 4943127 1000 - . gene_id "MSTRG.13821"; transcript_id "ENSMUST00000146289"; gene_name "Peli3"; ref_gene_id "ENSMUSG00000024901"; 19 StringTie transcript 4935016 4943127 1000 - . gene_id "MSTRG.13821"; transcript_id "ENSMUST00000133254"; gene_name "Peli3"; ref_gene_id "ENSMUSG00000024901"; 19 StringTie transcript 4935023 4943127 1000 - . gene_id "MSTRG.13821"; transcript_id "ENSMUST00000133504"; gene_name "Peli3"; ref_gene_id "ENSMUSG00000024901"; 19 StringTie transcript 4941600 4943127 1000 - . gene_id "MSTRG.13821"; transcript_id "ENSMUST00000143930"; gene_name "Peli3"; ref_gene_id "ENSMUSG00000024901";
So there are no other annotations for Dpp3 or Peli3 and if I use the merged GTF for analysis I cannot distinguish between the genes since they have the same gene_id
gene_id
Is the solution just replacing the ref_gene_id with the gene_id?
Thank you
I ran this command using HiSAT2 aligned RNAseq data from mouse and I noticed that many genes in the reference GTF have more than one ref_gene_name
For example:
So there are no other annotations for Dpp3 or Peli3 and if I use the merged GTF for analysis I cannot distinguish between the genes since they have the same
gene_id
Is the solution just replacing the ref_gene_id with the gene_id?
Thank you