Closed andrewhercules closed 3 years ago
I'll extract some numbers from the efo_otar_profile.owl file that it's supposed to contain the disease-phenotype relationships from Monarch using the next axiom:
?d skos:related ?p
As a side note, this profile contains all the HP elements that map to the diseases. This could potentially mean a lot more terms than the "official" EFO release. I would try to get some numbers on that as well.
@hammer has previously shown interest on these activities, in case you or @dhimmel have any input
Upstream work here https://github.com/EBISPOT/efo/issues/794
Using efo_otar_profile.owl v3.23.0
# SPARQL query to get phenotype diseases
prefix owl: <http://www.w3.org/2002/07/owl#>
prefix rdf: <http://www.w3.org/2000/01/rdf-schema#>
prefix skos: <http://www.w3.org/2004/02/skos/core#>
SELECT ?d ?d_label ?p ?p_label
WHERE {
?d a owl:Class .
?d rdf:label ?d_label .
?d skos:related ?p .
?p rdf:label ?p_label
}
d | d_label | p | p_label |
---|---|---|---|
http://www.orpha.net/ORDO/Orphanet_96121 | 7q11.23 microduplication syndrome | http://purl.obolibrary.org/obo/HP_0000577 | Exotropia |
http://www.orpha.net/ORDO/Orphanet_96121 | 7q11.23 microduplication syndrome | http://purl.obolibrary.org/obo/HP_0001382 | Joint hypermobility |
http://www.orpha.net/ORDO/Orphanet_96121 | 7q11.23 microduplication syndrome | http://purl.obolibrary.org/obo/HP_0001999 | Abnormal facial shape |
http://www.orpha.net/ORDO/Orphanet_96121 | 7q11.23 microduplication syndrome | http://purl.obolibrary.org/obo/HP_0000023 | Inguinal hernia |
http://www.orpha.net/ORDO/Orphanet_96121 | 7q11.23 microduplication syndrome | http://purl.obolibrary.org/obo/HP_0000486 | Strabismus |
http://www.orpha.net/ORDO/Orphanet_96121 | 7q11.23 microduplication syndrome | http://purl.obolibrary.org/obo/HP_0000256 | Macrocephaly |
http://www.orpha.net/ORDO/Orphanet_96121 | 7q11.23 microduplication syndrome | http://purl.obolibrary.org/obo/HP_0002119 | Ventriculomegaly |
unique counts | |
---|---|
diseases | 3046 |
phenotypes | 1118 |
p_label | count |
---|---|
Seizure | 679 |
Intellectual disability | 677 |
Global developmental delay | 568 |
Microcephaly | 458 |
Muscular hypotonia | 424 |
Hypertelorism | 416 |
Micrognathia | 376 |
Strabismus | 362 |
Nystagmus | 297 |
Cleft palate | 294 |
Ataxia | 284 |
Epicanthus | 271 |
Failure to thrive | 269 |
Sensorineural hearing impairment | 242 |
Downslanted palpebral fissures | 230 |
High palate | 224 |
Clinodactyly of the 5th finger | 215 |
Short neck | 214 |
Low-set ears | 212 |
Wide nasal bridge | 211 |
d_label | count |
---|---|
Williams syndrome | 87 |
Distal monosomy 10q | 63 |
22q11.2 deletion syndrome | 61 |
Wiedemann-Rautenstrauch syndrome | 60 |
1p36 deletion syndrome | 59 |
7q11.23 microduplication syndrome | 58 |
Oculocerebrorenal syndrome | 57 |
Schwartz-Jampel syndrome | 54 |
Acroosteolysis dominant type | 53 |
Cornelia de Lange syndrome | 53 |
2p15p16.1 microdeletion syndrome | 52 |
Peters plus syndrome | 52 |
PMM2-CDG | 50 |
Wolf-Hirschhorn syndrome | 49 |
Fanconi anemia | 48 |
Smith-Magenis syndrome | 48 |
ADNP-related multiple congenital anomalies-intellectual disability-autism spectrum disorder | 47 |
Cardiac anomalies-developmental delay-facial dysmorphism syndrome | 47 |
Intellectual disability-feeding difficulties-developmental delay-microcephaly syndrome | 47 |
Smith-Lemli-Opitz syndrome | 47 |
@d0choa thanks for tagging me. Cool to see the SPARQL query and the results.
Some general questions about the approach (I'm new here):
?disease skos:related ?phenotype_1
as well as ?phenotype_1 skos:related ?phenotype_2
, such that ?phenotype_1
is both a phenotype of a disease, and a disease in and of itself?efo.owl
or just in efo_otar_profile.owl
? What is efo_otar_profile.owl
?Hi @dhimmel,
disease by anatomical system
. At the moment, the phenotypes have been only implemented in the profile
but they will soon be propagated to the slim
. Once this is done, the differences between the 2 would be merely technical (e.g. the profile is rooted whereas the slim is not). Just for some (related) clarification on the nature of the d2p links:
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix owl: <http://www.w3.org/2002/07/owl#>
prefix OBAN: <http://purl.org/oban/>
prefix dc: <http://purl.org/dc/elements/1.1/>
prefix RO: <http://purl.obolibrary.org/obo/RO_>
SELECT ?disease ?d_label ?phenotype ?p_label ?source
WHERE {
?disease rdf:type owl:Class ;
<http://www.w3.org/2004/02/skos/core#related> ?phenotype .
[ rdf:type owl:Axiom ;
owl:annotatedSource ?disease ;
owl:annotatedProperty <http://www.w3.org/2004/02/skos/core#related> ;
owl:annotatedTarget ?phenotype ;
dc:source ?source ] .
OPTIONAL {
?disease rdf:label ?d_label .
}
OPTIONAL {
?phenotype rdf:label ?p_label .
}
}
One thing I would like you to be aware of that this "related-to" kind of associative data is completely de-contextualised, i.e. noisy.
The monarch representation of these mappings have more rich contextual metadata, in particular:
Frequency qualifiers, qualification of onset, sex-specificity and evidence codes from ECO, for example:
MONARCH:b00571f419549ef8081e a OBAN:association ;
RO:0002558 ECO:0000269 ;
dc:source PMID:27158779 ;
OBAN:association_has_object HP:0003762 ;
OBAN:association_has_predicate RO:0002200 ;
OBAN:association_has_subject OMIM:617925 ;
:frequencyOfPhenotype "1/1" ;
:has_sex_specificity PATO:0000383 ;
:onset HP:0003577 .
This information may be very important for your analyses-> if you need those in EFO, you should make a ticket to that end (they are easy to add in - they are just many). You can also see in this example that more semantically meaningful links are given by Monarch (i.e. RO:0002200) - which may be worthy of consideration for you as well..
Note that the original data also has negated information, so disease2phenotype associations that explicitly do not hold. This information has been filtered out for the Monarch data dump. If you are interested how the monarch TTL representation of the HPOA files comes about, check here. I love this subject and I am happy to help @zoependlington to make this as useful as possible for you!
As for our 20.11 release, we have used the 3.24 slim containing all the phenotype terms linked to diseases as described above. Users can search for the terms, but there is not yet any information on which phenotypes are linked to which diseases.
Given the comments raised above by @zoependlington and @matentzn, we are trying to import the enriched Monarch metadata for the relationships. @cmalangone is working directly with the Monarch data dump. Once we have a preliminary working version, we might ask for feedback.
Tagging for our 21.02 release
Some checks about these last indices:
Things to fix or to do:
Hey @cmalangone Does this mean you don't need the links anymore from the EFO OTAR profile? You take them directly from Monarch?
hi @matentzn, we are still prototyping it. At the moment, we are trying to get the relationships from Monarch directly together with all the rich metadata that you suggested. We'll let you know if this works for us. @cmalangone might have some questions for you, though.
Always! :) Let me know how I can help.
hi @matentzn, can you please get in touch with me? I have a couple of good example with different info. Thanks
Sent you an email!
Phenotypes are integrated with Disease (efo-ontology) The resources involved are hpo_phenotypes info: uri: http://compbio.charite.de/jenkins/job/hpo.annotations.current/lastSuccessfulBuild/artifact/current/phenotype.hpoa hp ontology: uri: https://raw.githubusercontent.com/obophenotype/human-phenotype-ontology/master/hp.owl mondo ontology: uri: https://github.com/monarch-initiative/mondo/releases/download/v2020-12-18/mondo.owl
PIS, ETL and GraphQL were updated accordingly with the new format/data.
We implemented a final test in order to check if the related:sko info are available using the phenotypes entries.
One thing that has nothing to do with the ticket but I noticed: You should never use the raw github URLs for referring to any of these ontologies and data sets. For example: The charite link for hpoa is already stale. Moved elsewhere last month. hp.owl soon will grow beyond 100MB and then migrate away from Github to Github releases. Always use purls:
http://purl.obolibrary.org/obo/hp/hpoa/phenotype.hpoa http://purl.obolibrary.org/obo/hp.owl http://purl.obolibrary.org/obo/mondo.owl
HP and Mondo are both correctly versioned: http://purl.obolibrary.org/obo/hp/releases/2021-02-08/hp.owl http://purl.obolibrary.org/obo/mondo/releases/2021-01-29/mondo.owl
Talk soon!
You should never use the raw github URLs for referring to any of these ontologies and data sets.
Noting that it is sometimes useful to use commit-hash-versioned GitHub URLs to datasets, which can be generated by pressing y
. There is always a chance a repo could rewrite history, so agree with @matentzn to use a community-approved permalink whenever possible.
HP and Mondo are both correctly versioned:
Didn't know about these versioned PURLs. Very useful. Looks like it also works with Gene Ontology like http://purl.obolibrary.org/obo/go/releases/2021-02-01/go-basic.json.gz
.
IIUC EFO is not indexed by OBO Foundry, such that versioned links should go to the GitHub releases like https://github.com/EBISPOT/efo/releases/download/v3.27.0/efo_otar_slim.owl
? Or is it preferred to do https://www.ebi.ac.uk/efo/releases/v3.27.0/efo_otar_slim.owl
?
Hey @dhimmel
Yeah, if you include the commit hash, I guess there are some use cases - however, as long as we are talking about ontology release files, using commit hash references should be equivalent to using the version IRI! :)
Versioned purls should really work for all OBO ontologies, but does not quite yet :{ all the bigger ones though have it.
Regarding EFO, good questions! Not sure whether the efo otar slim has a purl.. @zoependlington ?
The Open Targets slim and profile do have versioned purls. e.g. http://www.ebi.ac.uk/efo/releases/v3.14.0/efo_otar_slim.owl http://www.ebi.ac.uk/efo/releases/v3.14.0/efo_otar_profile.owl
Thanks everyone for the useful feedback. We are definitely reviewing the use of permalinks in our codebase. #1395
Phenotypes are integrated with Disease (efo-ontology)
@cmalangone if I understand your comment correctly, you were saying that:
efo_otar_profile.owl
to get more of the "contextual metadata"Is that correct? If so, is the code and/or output dataset available? I'd like the same thing, but don't want to re-implement the extraction if you've already done it.
I am also interested in seeing this pipeline if its public!
@dhimmel : Sorry for the delay I was off. platform-input-support retrieves the last EFO (owl) file and transforms it into a json file. It downloads the http://purl.obolibrary.org/obo/hp/hpoa/phenotype.hpoa file too. https://github.com/opentargets/platform-input-support/blob/master/config.yaml#L89
The ETL step reads the EFO json file and it joins the field "dbXRefs" with the different IDs from MONDO (using the field 'id') and phenotype (using the field 'databaseId') resources.
Code here: https://github.com/opentargets/platform-etl-backend/blob/master/src/main/scala/io/opentargets/etl/backend/Disease.scala#L121 https://github.com/opentargets/platform-etl-backend/blob/master/src/main/scala/io/opentargets/etl/backend/Disease.scala#L122 https://github.com/opentargets/platform-etl-backend/blob/master/src/main/scala/io/opentargets/etl/backend/Disease.scala#L123
@dhimmel Info about inputs/outputs
INPUTS:
EFO owl file: https://storage.googleapis.com/open-targets-data-releases/21.06/input/annotation-files/ontology/efo_otar_slim.owl
Mondo owl file: open-targets-data-releases/21.06/input/annotation-files/ontology/mondo.owl
EFO json after platform-input-support: https://storage.googleapis.com/open-targets-data-releases/21.06/input/annotation-files/ontology/efo_json/ontology-efo-v3.31.0.jsonl
MONDO after platform-input-support: https://storage.googleapis.com/open-targets-data-releases/21.06/input/annotation-files/ontology/efo_json/ontology-mondo.jsonl
phenotype after platform-input-support: https://storage.googleapis.com/open-targets-data-releases/21.06/input/annotation-files/ontology/efo_json/hpo-phenotypes-2021-06-18.jsonl
OUTPUT: https://console.cloud.google.com/storage/browser/open-targets-data-releases/21.06/output/etl/json/diseases or gs://open-targets-data-releases/21.06/output/etl/json/diseases/*
With the release of EFO 3.23.0, there will be new phenotype data that we can integrate into the Platform to enrich the disease index and display on the disease profile page