Closed bschilder closed 8 months ago
I've applied for full data access to OMIM, but in the meantime i might be able to this info from BioPortal:
https://bioportal.bioontology.org/ontologies/OMIM https://bioportal.bioontology.org/ontologies/ORDO
OMIM and Orphanet are on BioPortal, but i seems DECIPHER is not
New function add_disease_definition
works for OMIM / Orphanet terms. Just need the DECIPHER metadata
phenos <- load_phenotype_to_genes(3)[seq_len(1000)]
phenos2 <- add_disease_definition(phenos = phenos)
Importing Orphanet metadata.
Importing OMIM metadata.
Filling 140010 / 252693 (55.41%) missing Definitions with DiseaseName.
Without DECIPHER, we're still missing ~55% of disease definitions in this file
I had disease descriptions saved in either a json or dataframe for the hover box in the app. I think I pulled them from an API and added new lines ('\n') to make it fit. Sounds like you've come up with something but I can have a look for how I did it if needed
I had disease descriptions saved in either a json or dataframe for the hover box in the app. I think I pulled them from an API and added new lines ('\n') to make it fit. Sounds like you've come up with something but I can have a look for how I did it if needed
Are you thinking of the HPO terms, not the diseases?
If so, thanks for the offer but no need. I replaced that file with a new object HPOExplorer::hpo_meta
which comes from the latest release of HPO and thus covers all terms (the old one had a lot of NAs for terms)
@bobGSmith I think this might be file you're thinking of. But it actually contains HPO phenotypes, not OMIM/DECIPHER/Orphanet diseases: https://github.com/neurogenomics/rare_disease_celltyping_apps/tree/d1458330ca5f716856ebd60e86976535f08c78b9/Cell_select_interactive/data
Potentially useful resources for mapping OMIM/DECIPHER/Orphanet IDs to names:
Rdiagnosislist
ROMOP
: Extracts OMOP concepts from EHR databases. Written by Ben Glicksberg.omopr
(deprecated and removed from CRAN)rUMLS
OMOP2OBO
phenotron
phenopacket-schema
metamorphosys
Some of these tools are also generally useful for mapping non-standardised terms (e.g. traits in OpenGWAS) to ontology-controlled terms.
Was able to map >99% of Disease IDs in HPO to MONDO IDs, but these MONDO IDs do not seem to match up with the IDs in other slots of the same OBO object. Requested assistance here:
Regarding mapping HPO "disease_id" to MONDO IDs, MONDO names, and MONDO definitions; the missing rate is still very high.
> phenos <- load_phenotype_to_genes("phenotype.hpoa")
Reading cached RDS file: phenotype.hpoa
+ Version: v2023-10-09
> phenos2 <- add_mondo(phenos = phenos)
Adding disease metadata: Definitions, Preferred.Label
Importing Orphanet metadata.
Importing OMIM metadata.
0 / 12527 (0%) disease_name missing.
8228 / 12468 (65.99%) Definitions missing.
Annotating phenos with MONDO metadata.
ℹ All local files already up-to-date!
33 / 12468 (0.26%) MONDO_ID missing.
11573 / 12468 (92.82%) MONDO_name missing.
12396 / 12468 (99.42%) MONDO_definition missing.
Weirdly, only 0.26% of disease_id don't have a MONDO_ID, and yet over 92% are missing a MONDO_NAME. And the baseline missing rate in MONDO is very low for "name" (though higher for "def"):
> sum(is.na(mondo$name))/length(mondo$name)*100
[1] 0.06680771
> sum(is.na(mondo$def))/length(mondo$def)*100
[1] 38.12864
Can't find disease descriptions beyond the name of the disease itself in any of the HPO annotation files. But this info should be in a table somewhere. Perhaps in the bulk downloads provided on each disease database: OMIM/DECIPHER/Orphanet https://github.com/neurogenomics/RareDiseasePrioritisation/issues/26