opentargets / issues

Issue tracker for Open Targets Platform and Open Targets Genetics Portal
https://platform.opentargets.org https://genetics.opentargets.org
Apache License 2.0
12 stars 2 forks source link

Select the list of effector genes for "out-of-sample" validation #3528

Closed addramir closed 1 week ago

addramir commented 2 months ago

Related to https://github.com/opentargets/issues/issues/3500. As discussed before we should select a list of gene-EFO pairs for additional validation of resulting L2Gs. These out-of-sample effector genes will not be participating in training the model. Current idea is to use Eric Fauman's list of genes since we use only our curated old list and chembl for training.

addramir commented 2 months ago

The draft plan: 1) Take Eric's Fauman list (we are not using it for training). 2) Select best CS as gold positive and assign gold negatives, similar to what we do with training. 3) Use it to validate the model, e.g. FP, FN, TP, TN using l2g>=0.5. Compare it with jsut distance approach (closest by tss) and holistic approach.

addramir commented 1 week ago

This is v5.1_full for cross validation: gs://genetics-portal-dev-analysis/yt4/20241024_EGL_playground/training_set/v5_1_full.json

This is v5.1_validation for validation (==full-5.1_trining) gs://genetics-portal-dev-analysis/yt4/20241024_EGL_playground/training_set/v5_1_validation.json