Open bschilder opened 2 months ago
Of course, this won't yield the "true" semantic similarity between the nodes unless your graph is sufficiently large and complete to accurately characterise the relationships between the nodes. But still, can come in handy.
monarch_semsim
does something a bit like this, but instead compares two graph to each other.
Is it possible to query the semantic similarity API for a single graph instead (so we can get accurate similarity scores between all edge-connected nodes within the graph)? Might be a nice complement to my graph_semsim
function which only considers the local graph instance.
The Monarch semsim API does take two sets of node IDs, and computes the best match from each in set A to those in set B, and vice-versa (essentially designed to support https://monarchinitiative.org/explore#phenotype-explorer).
I suppose for a single graph we could just pass the nodes as both set A and set B, but the functionality giving just the best match means that each node will just be reported to match itself (I think). Perhaps this is a feature request for the API - all-vs-all semantic similarity queries. Could be intensive though given the O(n^2) nature of the result. Tagging @kevinschaper
The Monarch semsim API does take two sets of node IDs, and computes the best match from each in set A to those in set B, and vice-versa (essentially designed to support https://monarchinitiative.org/explore#phenotype-explorer).
I suppose for a single graph we could just pass the nodes as both set A and set B, but the functionality giving just the best match means that each node will just be reported to match itself (I think). Perhaps this is a feature request for the API - all-vs-all semantic similarity queries. Could be intensive though given the O(n^2) nature of the result. Tagging @kevinschaper
Yeah, instead of returning only the top 1 similar node I'd want to return each node's similarities with each other node. I agree, for large graphs this would be a massive computation. Perhaps precomputing this and storing it as a separate database (version-controlled and regenerated for each KG release). Is this something that would be feasible @kevinschaper ?
A very common thing users might want to do is to compute the semantic similarity between nodes in a graph and then store that data back in the edges of the graph (to use as edge weights later).
I've created the following function to automate this.