Investigating an unexpected behaviour reported for the `enable indirect` API endpoint

buniello commented 1 year ago

Following up from this community post, we plan to investigate and fix the problem causing an unexpected behaviour for the enable indirect API endpoint. This endpoint is not exposed in the frontend, and therefore not extensively tested.

This example API query provided by the user.

``` query associatedDiseases { target(ensemblId: "ENSG00000160710") { id approvedSymbol associatedDiseases(enableIndirect: true) { count rows { score disease { id name } datasourceScores { id score } datatypeScores { id score } } } } } The output: { "data": { "target": { "id": "ENSG00000160710", "approvedSymbol": "ADAR", "associatedDiseases": { "count": 11197, "rows": [ { "score": 0.915009673664289, "disease": { "id": "EFO_0000222", "name": "acute myeloid leukemia" }, "datasourceScores": [ { "id": "chembl", "score": 0.9920171593572428 }, [...] ], "datatypeScores": [ { "id": "known_drug", ```

First observations suggesting this is a backend problem:

1) From the example reported by the user, there is no chEMBL evidence displayed on the evidence page

1) BigQuery on the associationByDatasourceIndirect table does not return any indirect evidence neither

BigQuery response

``` SELECT * FROM `open-targets-eu-dev.platform_dev.associationByDatasourceIndirect` WHERE targetId = "ENSG00000160710" AND diseaseId = "EFO_0000222" LIMIT 10 datatypeId datasourceId diseaseId targetId score evidenceCount literature europepmc EFO_0000222 ENSG00000160710 0.656767905 6 ```

d0choa commented 1 year ago

@JarrodBaker let us know if you do have the time to have a look at this.

d0choa commented 1 year ago

This endpoint will in principle be exposed in AOTF. Tagging

mbdebian commented 11 months ago

@d0choa , @remo87 , has this been looked into?

d0choa commented 10 months ago

This hasn't been fixed.

mbdebian commented 5 months ago

We need to confirm whether this still applies or not

d0choa commented 5 months ago

Yes it does, and it's important

remo87 commented 4 months ago

After discussing the findings on the code review we're going to be adding descriptions to the arguments in the associations in the flight query. This way the users can know what to expect when they apply an argument. This is due to a difference in calculations when a Target is fixed to the scenario where a Disease is fixed.

hendrikweisser commented 4 months ago

I don't know what you found in your code review and I appreciate better documentation, but I worry that this isn't addressing the main problem. I think (some of) the results for my example query in the bug report (https://community.opentargets.org/t/spurious-indirect-association-evidence-via-graphql-api/879) are flat-out wrong. Querying disease associations for the ADAR gene (with "enableIndirect: true"), I get very high scores from "chembl" ("datasourceScores") and "known_drug" ("datatypeScores"). But as far as I can see ChEMBL doesn't list any drugs targeting ADAR (UniProt ID P55265) and as far as I know there aren't any. I think it would be worth investigating the data behind these specific scores and checking if they actually make sense.

d0choa commented 4 months ago

What @remo87 is trying to document is the reason behind the behaviour you are observing.

When fixing a target entity (e.g. ADAR), the current enableIndirect: true will propagate the evidence in the protein-protein interaction network. That means that the association might be based on proteins interacting with ADAR and not ADAR itself.

This behaviour is different than the enableIndirect: true when fixing a disease in which the propagation of evidence is done in the disease ontology.

As discussed before, this behaviour is not exploited in the UI and the data dumps only capture the direct/indirect propagation of evidence through the disease ontology. For now, we are documenting the API endpoint to prevent more confusion. We have several streams of work to better exploit the interaction data since we know it's a relevant strategy to identify disease-relevant targets.

Does this make sense?

hendrikweisser commented 4 months ago

When fixing a target entity (e.g. ADAR), the current enableIndirect: true will propagate the evidence in the protein-protein interaction network.

@d0choa, thank you for clarifying. I was not aware of this at all.

Is there any way (through the API) to get the information that I thought I was querying, i.e. disease associations for a given target, but with evidence propagated through the disease ontology? Often this is what you want, e.g. for an association with "breast cancer" to take into account evidence for all subtypes of breast cancer.

d0choa commented 4 months ago

Unfortunately, not through the API. We can have a look but we might run into performance issues. To compute this we need to propagate evidence in the ontology for every row in the heatmap not just the fixed entity as it's currently implemented.

If you work with "breast cancer" and this is your fixed entity, what you are describing is the default behaviour. You are looking at "breast cancer" and all the subtypes of breast cancer. Doing the same for the 360 diseases associated with ADAR at the same time is what is not available in the API.

It's a lot easier to do these types of queries with the data dumps, but I understand this is not your use case.

hendrikweisser commented 4 months ago

Understood. Could it work for a single target-disease association at a time (e.g. "ADAR - breast cancer")?

opentargets / issues

Investigating an unexpected behaviour reported for the `enable indirect` API endpoint #2841