monarch-initiative / biolink-api

API for linked biological knowledge
https://api.monarchinitiative.org/api/
BSD 3-Clause "New" or "Revised" License
63 stars 25 forks source link

Clique leader not reachable from outgoing edges from a node in SciGraph #321

Closed deepakunni3 closed 4 years ago

deepakunni3 commented 4 years ago

When querying SciGraph, I observed something odd:

If I query for an NCBIGene node and try to get OUTGOING edges, I do not get any HGNC node (even though the clique leader is expected)

Alternatively, If I query for the same NCBIGene node and try to get INCOMING edges, I get the proper HGNC node.

Is there a reason why this changed? Shouldn't equivalentClass edges between the two nodes be bidirectional?

scigraph.scigraph_util.SciGraph.traverse_chain() fetches OUTGOING edges, by default. Earlier this method would give the clique leader.

The context for all this is that scigraph.scigraph_util.SciGraph#gene_to_uniprot_proteins now returns no protein identifier for a gene unless the input CURIE is the clique leader.

So what changed? @kshefchek thoughts?

kshefchek commented 4 years ago

SciGraph does not add the inverse edge when an owl property is symmetrical. But this is odd that it changed, I'll take a closer look. Using /dynamic/cliqueLeader/{id} is the safest way to get a clique leader.

kshefchek commented 4 years ago

@deepakunni3 I don't see a change in the entrez gene ingest for making eqs to HGNC, do you recall the specific ID(s)?

deepakunni3 commented 4 years ago

To give more context:

Currently api-prod uses MyGeneInfo for converting gene-to-protein where as api-dev uses SciGraph for conversion (which is what we want).

https://api.monarchinitiative.org/api/bioentity/gene/NCBIGene%3A6469/function?rows=100&facet=false&unselect_evidence=false&exclude_automatic_assertions=false&fetch_objects=false&use_compact_associations=false

Yields results since we convert NCBIGene:6469 to its equivalent UniProtKB via MyGeneInfo.

Now, the same query to api-dev,

https://api-dev.monarchinitiative.org/api/bioentity/gene/NCBIGene%3A6469/function?rows=100&facet=false&unselect_evidence=false&exclude_automatic_assertions=false&fetch_objects=false&use_compact_associations=false

Yields no result. This is because we don't get any clique leader for NCBIGene:6469, via scigraph.scigraph_util.SciGraph.traverse_chain(), which is required to fetch a UniProtKB identifier from SciGraph.

After digging around, I noticed that the directionality of the equivalentClass edge changed. traverse_chain() tries to get OUTGOING equivalentClass edges from a node. But in this scenario, there are no OUTGOING equivalentClass edges from NCBIGene:6469. But there are INCOMING equivalentClass edges to NCBIGene:6469.

The logic for traverse_chain() method hasn't changed in a long time. Which is what led me to believe that something else changed.

@kshefchek Would appreciate your inputs on this.

We cannot update api-prod until this is fixed.

kshefchek commented 4 years ago

I'll have to take a closer look at dipper to see if the data changed, but I think the easiest thing is to use the cliqueLeader service

deepakunni3 commented 4 years ago

Sure! It seems more straightforward and correct than getting it via traverse_chain 👍