Closed boboppie closed 10 years ago
TODO create an URL - human.intermine.org
installed bioseg
TODO - Need a human identifier resolver to include ncbi, ensembl and hgnc, then resolve any id to ncbi id.
The human id resolver will miss some/many transcripts (mRNA, etc.) from ensembl since there is no equivalent entities in NCBI/HGNC (different gene models). e.g. ENST00000000233 (http://www.ensembl.org/Homo_sapiens/Transcript/Summary?g=ENSG00000004059;r=7:127228399-127231759;t=ENST00000000233), the gene (ENSG00000004059) has another 5 transcripts products, but only ENST00000000233 has a CCDS id (CCDS34745), this is also the case in HGNC (http://www.genenames.org/data/hgnc_data.php?hgnc_id=658), only this transcript can be resolved by CCDS id (not 100% match), 5 ensembl entities will be lost.
exons/CDSs don't have ids/names in genbank, but ensembl has internal ids for them. How to resolve? Discard?
ncbi-summary resolves Entrez ids to HGNC symbols which causes loss of 134 genes (mostly microRNAs), e.g. http://www.ncbi.nlm.nih.gov/gene/100526648
The gene id is ENSG00000198888, in protein-atlas, we set "gene" as default SOTerm, but ensembl-human uses a different one:
Set ensembl-human a higher priority.