Open cbizon opened 2 years ago
Interestingly this query: https://arax.ncats.io/?source=ARS&id=bf04c388-b4d2-482e-9ddc-abb92c6c81c8
which is the same, but uses "ChemicalEntity" produces much nicer results. I think it's because the original NamedThing sets the denominator of the enrichment to something giant. So even things like disease get linked in. Maybe we need some kind of dynamic denominator
A similar issue can happen with e.g. chemicals. CHEBI is a subset of chemicals, but it has subclasses in it. If you use "all chemicals" as the denominator size, then if you have more chebis that randomly expected (which is reasonable given that chebi contains the 'most interesting' or at least most annotated chemicals), then it will look like you've chosen a meaningful set of chemicals because they're all descended from some high-level chemical class.
I'm probably overthinking some of this. Our edges are based on what's in our local graph. So the denominators should be based on that, and we should just ignore edges that don't occur in that graph. There are perhaps other approaches but this is the most straightforward. So the main thing to do is first remove any answers that don't occur in our local graph.
https://arax.ncats.io/?source=ARS&id=e2800952-aae8-4605-97ce-4cfbc596934e
The query is https://github.com/NCATSTranslator/testing/blob/main/ars-requests/not-none/1.2/risk.json
There are numerous results I don't like. Like "disease" and "blood".
Also systematically it's preferring gene answers to chemical answers. Is that ok? Maybe.
Also, the first hits things that are near-synonyms with the input. This isn't wrong, it's right, but it's not terribly helpful.