opentargets / issues

Issue tracker for Open Targets Platform and Open Targets Genetics Portal
https://platform.opentargets.org https://genetics.opentargets.org
Apache License 2.0
12 stars 2 forks source link

L2G model evaluation is very slow #3263

Open ireneisdoomed opened 3 months ago

ireneisdoomed commented 3 months ago

Describe the bug Running the L2G training step and logging the results to the W&B dashboard takes ~2 hours.

Observed behaviour The changes in https://github.com/opentargets/gentropy/pull/544 removed some data caching steps to avoid memory issues. This has had an impact in the experiment logging. I ran L2G in the development machine (single node) and took over 2h.

I didn't follow the process in the Spark UI, but I did notice:

  1. Training took ~1h
  2. After training was complete, evaluating the model triggered another training step because the process was not checkpointed.

Expected behaviour If training takes ~30 minutes when we run the step from Airflow (and without model evaluation), the process with the evaluation part should take a similar amount of time.

To Reproduce Steps to reproduce the behaviour:

  1. Create dev environment make create-dev-cluster
  2. Tweak configuration ot_locus_to_gene_train.yaml and set wandb_run_name
  3. Run step gentropy --config-dir="/config" --config-name="ot_config.yaml" step=ot_locus_to_gene_train