ohsu-comp-bio / g2p-aggregator

Associations of genomic features, drugs and diseases
48 stars 11 forks source link

Not all TCGA codes are resolved by calls to normalize() #113

Closed prismofeverything closed 6 years ago

prismofeverything commented 6 years ago

A number of TCGA disease codes were unable to be resolved using calls to normalize():

In [1]: import disease_normalizer

In [2]: disease_normalizer.normalize('LUAD')
Out[2]: 
[{'family': u'lung cancer',
  'label': 'lung adenocarcinoma',
  'ontology_term': 'DOID:3910',
  'source': 'http://purl.obolibrary.org/obo/doid'}]

In [3]: disease_normalizer.normalize('KIRP')
Out[3]: []

List of codes I found unable to be resolved:

TGCT
PCPG
LGG
COADREAD
LIHC
MESO
KIRC
KIRP
SKCM
DLBC

Note this is without the use of BIOONTOLOGY_API_KEY, which I'm not sure where to acquire. Does that API fill in the missing codes?

bwalsh commented 6 years ago

Upcoming PR will resolve:

COADREAD --> [{'source': 'http://purl.obolibrary.org/obo/doid', 'ontology_term': 'DOID:0050861', 'family': u'intestinal cancer', 'label': 'colorectal adenocarcinoma'}]
DLBC --> [{'source': 'http://purl.obolibrary.org/obo/doid', 'ontology_term': 'DOID:0050745', 'family': u'lymphoma', 'label': 'diffuse large B-cell lymphoma'}]
KIRC --> [{'source': 'http://purl.obolibrary.org/obo/doid', 'ontology_term': 'DOID:4471', 'family': u'kidney cancer', 'label': 'chromophobe renal cell carcinoma'}]
KIRP --> [{'source': 'http://purl.obolibrary.org/obo/doid', 'ontology_term': 'DOID:4467', 'family': u'kidney cancer', 'label': 'clear cell renal cell carcinoma'}]
LGG --> [{'source': 'http://purl.obolibrary.org/obo/doid', 'ontology_term': 'DOID:0060108', 'family': u'central nervous system cancer', 'label': 'brain glioma'}]
LIHC --> [{'source': 'http://purl.obolibrary.org/obo/doid', 'ontology_term': 'DOID:684', 'family': u'liver cancer', 'label': 'hepatocellular carcinoma'}]
MESO --> [{'source': 'http://purl.obolibrary.org/obo/doid', 'ontology_term': 'DOID:2645', 'family': u'cell type benign neoplasm', 'label': 'benign mesothelioma'}]
PCPG --> [{'source': 'http://purl.obolibrary.org/obo/doid', 'ontology_term': 'DOID:0050892', 'family': u'adrenal gland cancer', 'label': 'adrenal gland pheochromocytoma;Paraganglioma'}, {'source': 'http://purl.obolibrary.org/obo/doid', 'ontology_term': 'DOID:0050773', 'family': u'endocrine gland cancer', 'label': 'adrenal gland pheochromocytoma;Paraganglioma'}]
SKCM --> [{'source': 'http://purl.obolibrary.org/obo/doid', 'ontology_term': 'DOID:8923', 'family': u'integumentary system cancer', 'label': 'skin melanoma'}]
TGCT --> [{'source': 'http://purl.obolibrary.org/obo/doid', 'ontology_term': 'DOID:5557', 'family': u'male reproductive organ cancer', 'label': 'testicular germ cell cancer'}]