Closed pesho-ivanov closed 6 years ago
Yes, interesting cases where annotations differ. I think this has been seen before.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4339237/
https://www.biostars.org/p/16505/
I'm pulling data directly from Ensembl. What might be interesting is to take all the cases where a single Ensembl ID maps to multiple Entrez IDs, then use those Entrez IDs together with the bioconductor annotation packages to get the gene symbols based on those IDs, then look at cases where the gene symbols differ. Count, try to explain why.
I suspect mistakes in gene symbols.
I was making the wrong assumption that there is unique correspondence between the rows in grch38/grch37 and the different ENSGs. It turned out there there are
ensgene
repetitions:I checked several of these 361 duplicating genes and it seems that the entrez gid's are the only difference:
I further looked at the NCBI webside for the different entrez gid's and they point to different genes CALM genes (not only CALM1).
Version: