ranking-agent / aragorn-ranker

Exposes TRAPI functions to add literature co-occurrence edges, convert publications to edge weights, and provide scores for answers.
MIT License
0 stars 1 forks source link

Update omnicorp build to handle pubchem/chebi better. #19

Open cbizon opened 3 years ago

cbizon commented 3 years ago

Consider this query:

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "id": "UniProtKB:P52788",
                    "category": "biolink:Gene"
                },
                "n1": {
                    "category": "biolink:ChemicalSubstance"
                }
            },
            "edges": {
                "e01": {
                    "subject": "n0",
                    "object": "n1"
                }
            }
        }
    }
}

Currently, this returns a bunch of chemicals, normalized to pubchem ids. Omnicorp knows about pubchem ids, but I guess because the names are different in pubchem there are lots of cases where we have results in omnicorp for chebi but not for pubchem.

Originally, I was thinking that omnicorp overlay should look in the equivalent identifiers on the input graph and query the cache/postgres for those identifiers as well.

But I think that's wrong - first, you only get back counts, so if you get results for 2 equivalent identifiers, there's no good way to combine them or decide between them. Second, it makes a lot of (probably repeated) double querying. Now I think we should resolve this upstream when we build the omnicorp database and cache. All that we need to do is normalize identifiers where we still have the actual pubmed ids so that we can combine things.

The downside of this approach is that it will tie the cache to the normalization and biolink prefix ordering.

cbizon commented 2 years ago

This needs to be fixed. Looking at results from our benchmarks (particularly furosemide vs edema) we are not getting any furosemide omnicorp results b/c of this issue. In particular, it has 2 PUBCHEM.COMPOUND ids. The one in omnicorp and the one that we are querying over.