neurogenomics / HPOExplorer

Functions for working with the Human Phenotype Ontology data
https://neurogenomics.github.io/HPOExplorer/
4 stars 1 forks source link

Add descriptions of each disease #35

Closed bschilder closed 8 months ago

bschilder commented 1 year ago

Can't find disease descriptions beyond the name of the disease itself in any of the HPO annotation files. But this info should be in a table somewhere. Perhaps in the bulk downloads provided on each disease database: OMIM/DECIPHER/Orphanet https://github.com/neurogenomics/RareDiseasePrioritisation/issues/26

bschilder commented 1 year ago

I've applied for full data access to OMIM, but in the meantime i might be able to this info from BioPortal:

https://bioportal.bioontology.org/ontologies/OMIM https://bioportal.bioontology.org/ontologies/ORDO

OMIM and Orphanet are on BioPortal, but i seems DECIPHER is not

bschilder commented 1 year ago

New function add_disease_definition works for OMIM / Orphanet terms. Just need the DECIPHER metadata

bschilder commented 1 year ago
phenos <- load_phenotype_to_genes(3)[seq_len(1000)]
phenos2 <- add_disease_definition(phenos = phenos)
Importing Orphanet metadata.
Importing OMIM metadata.
Filling 140010 / 252693 (55.41%) missing Definitions with DiseaseName.

Without DECIPHER, we're still missing ~55% of disease definitions in this file

bobGSmith commented 1 year ago

I had disease descriptions saved in either a json or dataframe for the hover box in the app. I think I pulled them from an API and added new lines ('\n') to make it fit. Sounds like you've come up with something but I can have a look for how I did it if needed

bschilder commented 1 year ago

I had disease descriptions saved in either a json or dataframe for the hover box in the app. I think I pulled them from an API and added new lines ('\n') to make it fit. Sounds like you've come up with something but I can have a look for how I did it if needed

Are you thinking of the HPO terms, not the diseases?

If so, thanks for the offer but no need. I replaced that file with a new object HPOExplorer::hpo_meta which comes from the latest release of HPO and thus covers all terms (the old one had a lot of NAs for terms)

bschilder commented 1 year ago

@bobGSmith I think this might be file you're thinking of. But it actually contains HPO phenotypes, not OMIM/DECIPHER/Orphanet diseases: https://github.com/neurogenomics/rare_disease_celltyping_apps/tree/d1458330ca5f716856ebd60e86976535f08c78b9/Cell_select_interactive/data

bschilder commented 1 year ago

Potentially useful resources for mapping OMIM/DECIPHER/Orphanet IDs to names:

R packages

Other

Some of these tools are also generally useful for mapping non-standardised terms (e.g. traits in OpenGWAS) to ontology-controlled terms.

bschilder commented 12 months ago

Was able to map >99% of Disease IDs in HPO to MONDO IDs, but these MONDO IDs do not seem to match up with the IDs in other slots of the same OBO object. Requested assistance here:

bschilder commented 12 months ago

Regarding mapping HPO "disease_id" to MONDO IDs, MONDO names, and MONDO definitions; the missing rate is still very high.

> phenos <- load_phenotype_to_genes("phenotype.hpoa")
Reading cached RDS file: phenotype.hpoa
+ Version: v2023-10-09
> phenos2 <- add_mondo(phenos = phenos)
Adding disease metadata: Definitions, Preferred.Label
Importing Orphanet metadata.
Importing OMIM metadata.
0 / 12527 (0%) disease_name missing.
8228 / 12468 (65.99%) Definitions missing.
Annotating phenos with MONDO metadata.
ℹ All local files already up-to-date!
33 / 12468 (0.26%) MONDO_ID missing.
11573 / 12468 (92.82%) MONDO_name missing.
12396 / 12468 (99.42%) MONDO_definition missing.

Weirdly, only 0.26% of disease_id don't have a MONDO_ID, and yet over 92% are missing a MONDO_NAME. And the baseline missing rate in MONDO is very low for "name" (though higher for "def"):

> sum(is.na(mondo$name))/length(mondo$name)*100
[1] 0.06680771
> sum(is.na(mondo$def))/length(mondo$def)*100
[1] 38.12864