Closed tskir closed 1 month ago
The prototype is currently being developed in this repository. So far, the following runs have been done:
I'm currently finalising my investigation & configuration updates following the v5 run.
@tskir, @ireneisdoomed and I have tried to work on how to bring your efforts to production.
Now, there is a docker image of gentropy that is generated by Github actions anytime we merge something to dev. CI/CD uploads that image and it’s available in the Google Artifact registry. We have successfully used this image to run the fine-mapping step in Google Batch. You can find the image in this path: europe-west1-docker.pkg.dev/open-targets-genetics-dev/gentropy-app/gentropy:dev
.
Irene has an advanced draft of an Airflow DAG that generates the to-do list of loci to finemap and submits the Google batch job for the pending tasks (incremental pipeline). We have confirmed we can run 3 finemapping tasks by parametrising the studyLocusId
in parallel following your strategy but we haven't yet fine-tuned the batch submission based on your findings (machine type, spot-machines, parallelism, etc.).
Irene will open a draft PR for the DAG and we can discuss the approach. We will need to parametrise the Google batch job based on your findings. Hopefully, these efforts will help build a wrapper around your work. I just wanted to mention this, so you can focus on running and maximising performance and don't worry too much about the productisation for now.
@ireneisdoomed (draft) PR for reference https://github.com/opentargets/gentropy/pull/581
Yes, I second what David has mentioned. Please, have a look at the Missing
section. There are things yet to be determined, but most importantly we want to reproduce in the DAG @tskir 's findings.
Once that is done, I'd like to fine map the loci in the Alzheimers study to compare performance between the dockerisation vs the more naive approach.
@d0choa @ireneisdoomed Thank you for working on the orchestration part! Following the v6 run and my investigation of its results, I have provided updates in three separate issues: https://github.com/opentargets/issues/issues/3314 for retry policy, https://github.com/opentargets/issues/issues/3315 for resource usage, and also https://github.com/opentargets/issues/issues/3316 for run monitoring facilities (this might be useful when you start doing large production runs in Airflow)
Overall I feel we are very close to a stable run configuration and we can start migrating the configuration a Docker + Airflow set up soon.
Everything is done here: the finemapping orchestration has been implemented, first as a draft, and then as increasingly more stable and production-ready solution. The final update was merging this pull request into the orchestration repo: https://github.com/opentargets/orchestration/pull/10
This epic is to track progress on implementing a prototype of parallel finemapping computation.