Closed mistrm82 closed 5 years ago
Re: Multiple entrezIDs mapping to a single Ensembl ID “As for differences between Ensembl and EntrezGene, it was already mentioned in this thread that the CCDS set was constructed to come up with a more unified gene set. Ensembl, UCSC, NCBI and Havana all take part in forming and agreeing on the consensus coding sequences in this set, which currently exists for human and mouse. The latest update, in Sept 2011, shows there are 26,473 CCDS IDs in Human corresponding to 18,471 gene IDs. (CCDS can be splice variants of one gene; ie more than one CCDS can be assigned to a gene).
As for matches between Ensembl and EntrezGene, we know that for the human Ensembl gene set, we have 21,184 links to EntrezGene. We try to get a perfect match when possible. Out of these 21,184 links, 504 genes have more than one EntrezGene entry associated with them. This occurs when we cannot choose a perfect match; ie when we have two good matches, but one does not appear to match with a better percentage than the other. In that case, we assign both matches to the gene/transcript.” https://www.biostars.org/p/16505/
changed, a dataframe is now created for ahb, we can choose to use it or not for FA
The problem we encountered when trying to change to AnnotationHub is the one-to-many mappings of Ensembl to Entrez and the fact that it is stored as a list. Here is some code that will work if we choose to change it. If we change it would be worth exploring the difference between these and Ensv86 using AnnotationDbi.