monarch-initiative / helpdesk

The Monarch Initiative Helpdesk
BSD 3-Clause "New" or "Revised" License
7 stars 0 forks source link

How to retrieve cross-species mappings from the API #56

Closed NuriaQueralt closed 1 year ago

NuriaQueralt commented 2 years ago

Dear team, @kevinscharper,

I built a library that generates knowledge graphs retrieving selected edges from the Monarch KG API, and represent it into Neo4j. I would like to see these cross-species mappings such as MPO-HPO phenotypes when exploring the Neo4j graph. Is there a way in the current Monarch graph API to retrieve these cross-species mappings? Otherwise, I guess the only way I have to get them is via indirect relationship such as HPO-gene-ortholog-MPO?

Many thanks in advance. With kind regards, Núria

kevinschaper commented 2 years ago

Hi @NuriaQueralt,

Sorry for the very late response to this. I'm afraid that I don't know the answer. I don't think anything in the API does cross species mapping between phenotypes.

@matentzn, can these mappings be extracted from the ontology files? I think I could help out with thinking through a pipeline to get them to neo4j if they do. Unfortunately I don't know enough about the cross species phenotype similarity stuff to know if that's something that's materialized and curated or happens computationally at runtime.

matentzn commented 2 years ago

@kevinschaper we should add a semantic similarity endpoint to the monarch API to retrieve the exact information that is used to drive our phenotypic similarity tools like phenogrid etc. Will you make an issue about that at the monarch API repo?

In the meantime, can you check if there is an "owlsim.cache" file somewhere in the data that is published? I vaguely remember that this contains all of the semantic similarity information used to drive the site. Can you or someone in your team check:

  1. Does the file exist, is it used by the pipelne and is it downloadable?
  2. Spot check if 2-3 numbers shown on a UI widget are reflected in that dump?
kevinschaper commented 2 years ago

owlsim.cache is available here: https://data.monarchinitiative.org/latest/owlsim/index.html

I don't see that it's used anywhere in the old data pipeline outside of owlsim itself.

I'm not too familiar with owlsim, but I'm taking a shot at this check

It looks like for the row:

HP_0012115  HP_0010978  3.377149544570363   MP_0001790

I'm getting this result:

Screen Shot 2022-06-16 at 11 43 29 AM

The rounding looks off, but I assume that's about the number we would expect? I've tried a couple of other comparisons and it's just spinning, so I'm going to hold off with this so that I don't bring the server down with weird queries.

matentzn commented 2 years ago

Thank you @kevinschaper! Looks about what I expected. @NuriaQueralt we will open a new ticket to actually offer phenotype - phenotype mappings plus phenotypic similarity through the API, in the meantime, I think that owl.sim cache file is your best bet (in addition to what I told you here. I cant speak to the rounding to be honest.

NuriaQueralt commented 2 years ago

Thank you @kevinschaper and @matentzn !!! That all helps :)