Closed alex-seitz closed 8 months ago
I will have a look at it asap
Here is another example which possibly boils down to the same problem: The Gene DDX11L2 (ENSG00000236397) has two transcripts on two different chromosomes in NGSD: ensgid version source chromosome strand biotype ENST00000437401 1 ensembl 2 - unprocessed pseudogene ENST00000456328 2 ensembl 1 + lncRNA On the Ensembl they map to two different genes: ENSG00000236397 & ENSG00000290825
Fixed in commit: bdd495a9473f70c1384e1345978a113e9b84a2d6
There are still genes with transcripts on several chromosomes: ANKRD20A5P, DDX11L16, DDX11L2, LSP1P5, RPL23AP7, SNORA62, SNORA63, SNORA70, SNORA72, SNORA75, SNORD27, SNORD30, SNORD33, SNORD63, SNORD81
However they are not fixable easily, as they are caused by several Ensembl genes with the same HGNC-approved gene name, e.g.: www.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000456328 http://www.ensembl.org/Homo_sapiens/Transcript/Summary?g=ENSG00000236397;r=2:113599036-113601261;t=ENST00000437401
Hi, we realized an error on the parsing of the Ensembl into the NGSD database. There are several genes in the gff-file named
U8.1
up toU8.22
. These are different genes on different chromosomes. Apparently the geneSNORD118
has an alias ofU8
and somehow all of these transcripts are matched with theSNORD118
. Additionally, the geneU8
apperears more than once in ENSEMBL, e.g. under the following IDS:ENSG00000199713
ENSG00000239148
Is there any way to remedy this?
Best, Alex