Closed cbizon closed 1 year ago
Here's what I'm thinking for this:
I think this is a good place to start, and we can tweak as needed once the filtering is in place and we see how much it helps.
Could we make it a parameter on the query? Maybe an attribute in the TRAPI somewhere? Then we can easily experiment without fiddling with anything
Added in #416
Here is a cypher query that is ending up in uberongraph coming from strider:
This is a 1-hop, connecting 2 pinned nodes. But because of the subclassing, many many more edges are searched than you might expect by looking at MxN.
From the bigger set of ~100 mondos, there are about 50,000 subclasses found. They're heavily skewed towards a few very high-level nodes:
I increasingly think we can save time and reduce false positives by filtering these chunky nodes inside of strider. I also suspect that using information content as a proxy would work well. The first couple in that list have an IC < 35 as reported by nodenorm.
My proposal is that for non-pinned nodes, strider gets the IC from its NN calls and uses that to filter with a parameter that can be specified at query time. If we want, we could make that cutoff default to 0, or we could make a choice to start it higher, like 35 or 40.