ohsu-comp-bio / g2p-aggregator

Associations of genomic features, drugs and diseases
48 stars 11 forks source link

cgi importer requires dictionary for phenotype #39

Closed ahwagner closed 7 years ago

ahwagner commented 7 years ago

Currently, CGI uses abbreviations that are not recognized by our normalization routines as phenotypes, which leads to high normalization failure rates.

Contact David Tamborero for mappings.

bwalsh commented 7 years ago

We are having trouble match the following Primary Tumor type values in the disease ontology.

For example, should entries separated by semi-colons (ex: Liposarcoma;Lymphoma;Any cancer type) be read as (Liposarcoma OR Lymphoma OR Any cancer type) ?

association.phenotype.description.keyword: Descending Count
Any cancer type 142
Non-small cell lung 66
Gastrointestinal stromal 60
Ovary 29
Malignant peripheral nerve sheat tumor 18
Lung squamous cell 17
Renal 16
Thyroid 12
Bladder BLCA 10
Head an neck 8
Myelodisplasic proliferative syndrome 8
Breast;Any cancer type 7
Head an neck squamous 7
Thymic 7
Billiary tract 6
Glioma;Any cancer type 6
Non-small cell lung;Lagerhans cell histiocytosis;Erdheim-Chester histiocytosis 6
Prostate adenocarcinoma PRAD 6
Hepatic carcinoma 5
Chronic myeloid leukemia;Acute lymphoblastic leukemia 4
Liposarcoma;Lymphoma;Any cancer type 4
Malignant rhabdoid tumor 4
Mesothelioma 4
B cell lymphoma 3
Hematologic malignancies 3
Megakaryoblastic leukemia 3
Glioma;Leukemia 2
Glioma;Malignant peripheral nerve sheat tumor;Leukemia 2
Head an neck;Salivary glands 2
Malignant astrocytoma 2
Mesothelioma;Ovary 2
Myelodisplasic syndrome;Myelodisplasic proliferative syndrome 2
Non-small cell lung;Lung adenocarcinoma 2
Ovary;Any cancer type 2
Plexiform neurofibroma;Malignant peripheral nerve sheat tumor 2
Renal;Any cancer type 2
Schwannoma;Meningioma 2
Schwannoma;Neurofibroma 2
Stomach;Gastroesophageal junction adenocarcinoma 2
Thyroid;Glioma;Lung;Ovary;Breast;Any cancer type;Endometrium 2
Acute myeloid leukemia;Acute lymphoblastic leukemia;Myelodisplasic syndrome 1
Acute myeloid leukemia;Cervix;Ovary 1
Acute myeloid leukemia;Lung adenocarcinoma;Acute lymphoblastic leukemia 1
Acute myeloid leukemia;Myelodisplasic proliferative syndrome 1
Angiosarcoma;Renal 1
Basal cell carcinoma;Medulloblastoma 1
Bladder;Head an neck;Lung 1
Breast;Lung adenocarcinoma 1
Breast;Ovary;Cervix squamous cell;Endometrium 1
Breast;Stomach 1
Cervix 1
Cervix squamous cell 1
Colorectal adenocarcinoma COREAD 1
Colorectal adenocarcinoma;Inflammatory myofibroblastic 1
Cutaneous melanoma;Lung adenocarcinoma;Prostate adenocarcinoma 1
Cutaneous melanoma;Renal;Any cancer type BLCA 1
Cutaneous melanoma;Thyroid 1
Endometrium;Lung 1
Endometrium;Myeloma 1
Female germ cell tumor;Male germ cell tumor 1
Gastrointestinal stromal;Myelodisplasic syndrome;Myelodisplasic proliferative syndrome;Hyper eosinophilic advanced snydrome;Eosinophilic chronic leukemia;Chronic myeloid leukemia;Acute lymphoblastic leukemia;Systemic mastocytosis 1
Giant cell astrocytoma 1
Glioma;Thyroid carcinoma 1
Hyper eosinophilic advanced snydrome 1
Hyper eosinophilic advanced snydrome;Eosinophilic chronic leukemia 1
Inflammatory myofibroblastic 1
Inflammatory myofibroblastic;Thyroid carcinoma 1
Lung adenocarcinoma;Cutaneous melanoma;Prostate adenocarcinoma 1
Lung adenocarcinoma;Hairy-Cell leukemia;Myeloma 1
Lung adenocarcinoma;Stomach 1
Lung adenocarcinoma;Thyroid 1
Lung;Billiary tract 1
Lung;Colorectal adenocarcinoma 1
Lymphoma;Glioblastoma 1
Male germ cell tumor 1
Mantle cell lymphoma;Chronic lymphocytic leukemia 1
Myeloma;Neuroblastoma 1
Non-small cell lung;Colorectal adenocarcinoma 1
Pediatric glioma 1
Prostate adenocarcinoma PR 1
Prostate adenocarcinoma;Any cancer type 1
Prostate adenocarcinoma;Pancreas adenocarcinoma 1
Renal R 1
Renal angiomyolipoma RA 1
Renal angiomyolipoma;Giant cell astrocytoma RA 1
Sarcoma;Stomach 1
Solid tumors 1
Stomach;Adrenal gland 1
Stomach;Prostate adenocarcinoma 1
Urinary tract carcinoma 1
DavidTamborero commented 7 years ago

yes, tumor types separated by ';' means that the corresponding biomarker works equally (same effect, level of evidence,etc) for each of these tumors

bwalsh commented 7 years ago

conversation continues https://github.com/ohsu-comp-bio/g2p-aggregator/issues/48