related-sciences / nxontology-ml

Machine learning to classify ontology nodes
Apache License 2.0
6 stars 0 forks source link

Include EFO subsets as features for model #14

Closed dhimmel closed 11 months ago

dhimmel commented 1 year ago

https://github.com/related-sciences/nxontology-data/commit/650c09fb6f044d0ce65f37595261c68ec4e305dd adds a subsets field for each EFO term.

Here are the counts of 20 most common subsets:

subset_uri subset_id efo_id
http://purl.obolibrary.org/obo/mondo#ordo_disease mondo#ordo_disease 3009
http://purl.obolibrary.org/obo/mondo#gard_rare mondo#gard_rare 1901
http://purl.obolibrary.org/obo/chebi#3_STAR chebi#3_STAR 1614
http://purl.obolibrary.org/obo/mondo#ordo_malformation_syndrome mondo#ordo_malformation_syndrome 1568
http://purl.obolibrary.org/obo/mondo#ordo_group_of_disorders mondo#ordo_group_of_disorders 1333
http://purl.obolibrary.org/obo/mondo#disease_grouping mondo#disease_grouping 1330
http://purl.obolibrary.org/obo/mondo#ordo_clinical_subtype mondo#ordo_clinical_subtype 646
http://purl.obolibrary.org/obo/uberon/core#efo_slim core#efo_slim 568
http://purl.obolibrary.org/obo/uberon/core#pheno_slim core#pheno_slim 472
http://purl.obolibrary.org/obo/uberon/core#uberon_slim core#uberon_slim 420
http://purl.obolibrary.org/obo/uberon/core#human_reference_atlas core#human_reference_atlas 375
http://purl.obolibrary.org/obo/uberon/core#vertebrate_core core#vertebrate_core 241
http://purl.obolibrary.org/obo/hp#hposlim_core hp#hposlim_core 238
http://purl.obolibrary.org/obo/mondo#obsoletion_candidate mondo#obsoletion_candidate 220
http://purl.obolibrary.org/obo/mondo#ordo_morphological_anomaly mondo#ordo_morphological_anomaly 209
http://purl.obolibrary.org/obo/mondo#ordo_etiological_subtype mondo#ordo_etiological_subtype 154
http://purl.obolibrary.org/obo/mondo#clingen mondo#clingen 108
http://purl.obolibrary.org/obo/fbbt#cur fbbt#cur 107
http://purl.obolibrary.org/obo/mondo#predisposition mondo#predisposition 96
http://purl.obolibrary.org/obo/go#goslim_pir go#goslim_pir 64

Certain subsets are likely informative for our disease precision classification including mondo#ordo_group_of_disorders, mondo#disease_grouping, mondo#gard_rare, mondo#ordo_etiological_subtype, and others.