opentargets / issues

Issue tracker for Open Targets Platform and Open Targets Genetics Portal
https://platform.opentargets.org https://genetics.opentargets.org
Apache License 2.0
12 stars 2 forks source link

Benchmarking of features from gentropy #3301

Open addramir opened 2 months ago

addramir commented 2 months ago

This is a follow-up to the discussion about having gentropy features separately in OT, not just combined within L2G, and their benchmarking. Using L2G AUC is a very unoptimal/incorrect way of benchmarking features.

Background

There is a current pipeline of combining L2G results from several studies within one EFO:

  1. Filtering of CS by p-value (1e-8 according to documentation, different with clumping in the production (!))
  2. Harmonic average of L2G for each gene

The pipeline is not part of gentropy however it is unfair to hide these transformations of the data because no one benchmarked it in terms of drug target prediction (?). Having this data allows us to run few very straightforward tests, for example Matthew R. Nelson like approach estimating the impact of the genetic evidence on clinical success.

Having this pipeline in hands allows us: 1) Benchmark each gentropy feature separately, without combining it within l2g network 2) Play with p-value threshold and some cut-offs for feature creation 3) Estimate the influence of the sample size and the quality of the data 4) Make a preselection of features for L2G 5) Produce production ready results

I have a very strong feeling that this pipeline should be part of gentropy too and I feel that genetic team should be responsible for it.

Tasks

Acceptance tests

This is important discussion and I feel everyone should be involved in this.

d0choa commented 2 months ago

For context, there are several pieces in our ecosystem that interface with this topic:

None of the above solutions are exactly what you are asking, but they are attempts in a similar direction. We should discuss what's the best way to consolidate our efforts