opentargets / issues

Issue tracker for Open Targets Platform and Open Targets Genetics Portal
https://platform.opentargets.org https://genetics.opentargets.org
Apache License 2.0
12 stars 2 forks source link

Implement parallel finemapping computation #3302

Open tskir opened 2 months ago

tskir commented 2 months ago

This epic is to track progress on implementing a prototype of parallel finemapping computation.

tskir commented 2 months ago

The prototype is currently being developed in this repository. So far, the following runs have been done:

I'm currently finalising my investigation & configuration updates following the v5 run.

d0choa commented 2 months ago

@tskir, @ireneisdoomed and I have tried to work on how to bring your efforts to production.

Now, there is a docker image of gentropy that is generated by Github actions anytime we merge something to dev. CI/CD uploads that image and it’s available in the Google Artifact registry. We have successfully used this image to run the fine-mapping step in Google Batch. You can find the image in this path: europe-west1-docker.pkg.dev/open-targets-genetics-dev/gentropy-app/gentropy:dev.

Irene has an advanced draft of an Airflow DAG that generates the to-do list of loci to finemap and submits the Google batch job for the pending tasks (incremental pipeline). We have confirmed we can run 3 finemapping tasks by parametrising the studyLocusId in parallel following your strategy but we haven't yet fine-tuned the batch submission based on your findings (machine type, spot-machines, parallelism, etc.).

Irene will open a draft PR for the DAG and we can discuss the approach. We will need to parametrise the Google batch job based on your findings. Hopefully, these efforts will help build a wrapper around your work. I just wanted to mention this, so you can focus on running and maximising performance and don't worry too much about the productisation for now.

d0choa commented 2 months ago

@ireneisdoomed (draft) PR for reference https://github.com/opentargets/gentropy/pull/581

ireneisdoomed commented 2 months ago

Yes, I second what David has mentioned. Please, have a look at the Missing section. There are things yet to be determined, but most importantly we want to reproduce in the DAG @tskir 's findings. Once that is done, I'd like to fine map the loci in the Alzheimers study to compare performance between the dockerisation vs the more naive approach.

tskir commented 1 month ago

@d0choa @ireneisdoomed Thank you for working on the orchestration part! Following the v6 run and my investigation of its results, I have provided updates in three separate issues: https://github.com/opentargets/issues/issues/3314 for retry policy, https://github.com/opentargets/issues/issues/3315 for resource usage, and also https://github.com/opentargets/issues/issues/3316 for run monitoring facilities (this might be useful when you start doing large production runs in Airflow)

Overall I feel we are very close to a stable run configuration and we can start migrating the configuration a Docker + Airflow set up soon.