Ranker Slowness and Omnicorp Oddities

kennethmorton commented 1 year ago

I am trying to track down some slowness in Aragorn ranker. It was originally believed that the slowness was somehow related to query_id/qnode_id issues coming from Automat. However, this appears to be correlation rather than causation of the slowness. It may still ultimately be related but this is focusing on performance within ranker_obj.py

Looking at this Query

{ "nodes": {
    "on": { "ids": ["MONDO:0004979"] },
    "sn": { "categories": ["biolink:ChemicalEntity"] }
  },
  "edges": {
    "t_edge": {
      "subject": "sn",
      "object": "on",
      "knowledge_type": "inferred",
      "predicates": ["biolink:treats"]} 
  }
}

We find a very bimodal execution time for results. The normal, very fast, and the non-normal very slow.

Digging into one of the slow answers. It looks fine on the surface.

{
  "node_bindings": {
    "on": [
      { "id": "MONDO:0004766", "qnode_id": "MONDO:0004979" },
      { "id": "MONDO:0004979", "qnode_id": "MONDO:0004979" }
    ],
    "sn": [{ "id": "PUBCHEM.COMPOUND:301590" }]
  },
  "analyses": [
    {
      "resource_id": "infores:aragorn",
      "edge_bindings": {
        "t_edge": [{ "id": "4f8c7da6-8771-4bd1-ba94-1fb820b9e910" }]
      },
      "support_graphs": ["OMNICORP_support_graph_9"]
    }
  ]
}

The treats edge looks normal as do the two attribute edges and the web of edges that spawn from them(not shown):

{
  "4f8c7da6-8771-4bd1-ba94-1fb820b9e910": {
    "subject": "PUBCHEM.COMPOUND:301590",
    "object": "MONDO:0004979",
    "predicate": "biolink:treats",
    "sources": [
      {
        "resource_id": "infores:aragorn",
        "resource_role": "primary_knowledge_source"
      }
    ],
    "attributes": [
      {
        "attribute_type_id": "biolink:support_graphs",
        "value": [
          "7a2ecf78-c399-45cf-95f1-3e93185eae4a",
          "d9677631-66a2-4708-9b86-9df19c8353d3"
        ]
      }
    ]
  }
}

The problem comes from "OMNICORP_support_graph_9". Here is a bit

{
  "edges": [
    "a60dffc9-71ae-4c5c-ad71-d1864f61e0b5",
    "a60dffc9-71ae-4c5c-ad71-d1864f61e0b5",
    "a60dffc9-71ae-4c5c-ad71-d1864f61e0b5",
    "a60dffc9-71ae-4c5c-ad71-d1864f61e0b5",
    "a60dffc9-71ae-4c5c-ad71-d1864f61e0b5",
    "a60dffc9-71ae-4c5c-ad71-d1864f61e0b5",
    "a60dffc9-71ae-4c5c-ad71-d1864f61e0b5",
    "a60dffc9-71ae-4c5c-ad71-d1864f61e0b5",
    "a60dffc9-71ae-4c5c-ad71-d1864f61e0b5",
    "a60dffc9-71ae-4c5c-ad71-d1864f61e0b5",
    "a60dffc9-71ae-4c5c-ad71-d1864f61e0b5",
    "a60dffc9-71ae-4c5c-ad71-d1864f61e0b5",
    "a60dffc9-71ae-4c5c-ad71-d1864f61e0b5",
    "a60dffc9-71ae-4c5c-ad71-d1864f61e0b5",
    "a60dffc9-71ae-4c5c-ad71-d1864f61e0b5",
    "a60dffc9-71ae-4c5c-ad71-d1864f61e0b5",
    "a60dffc9-71ae-4c5c-ad71-d1864f61e0b5",
    "a60dffc9-71ae-4c5c-ad71-d1864f61e0b5",
    "a60dffc9-71ae-4c5c-ad71-d1864f61e0b5",
    "a60dffc9-71ae-4c5c-ad71-d1864f61e0b5",
    "a60dffc9-71ae-4c5c-ad71-d1864f61e0b5",
    "a60dffc9-71ae-4c5c-ad71-d1864f61e0b5"
    ....
  ]
}

Within this support graph there 11701 edges, however, many are duplicates. There are still 3385 unique edges. These edges end up making lots and lots of nodes. At the end of the day we end up with 700+ nodes.

In sorting through all of this I noticed that ranker wasn't properly finding all of the edges and supporting evidence contained in the set. Bug fixes are in the branch ranker_speed_investigation.

After the bug fixes the score for this answer went from 0 to 0.93. It's not even clear if this is a good thing, given the evidence or reality.

I explored ranker_obj to investigate the impact of the duplicate edges, It's not much. The real issue is having 3000+ edges to parse through and then having 700+ nodes. Even with 700+ nodes, the numpy calculations are still reasonably fast, 30ms on my laptop. The real killer is traversing the web of edges spawning from all 3385 edges, collecting the evidence and calculating weights. After the bug fixes the run-time is largely unchanged.

This happens with different OMNICORP support graphs and appears to be the dominant symptom for results that take 1sec or more to score.

I believe this is some sort of OMNICORP issue. We should explore more performance optimizations within ranker, but nothing is obvious, so hopefully the OMNICORP bug fix will be enough.

cbizon commented 1 year ago

One aspect to omnicorp's behavior here is that the same graph e.g. "OMNICORP_support_graph_1" is appearing in many results. So I think that it's accumulating a bunch of support edges that don't have anything to do with each other and attaching them to many results. suboptimal.

cbizon commented 1 year ago

Actually, I see I made a PR for this a while ago... Needs a review though

ranking-agent / aragorn-ranker

Ranker Slowness and Omnicorp Oddities #119