This repository serves as wrapper around Madminer physics workflow and Madminer ML workflow to construct a single, linked workflow to be executed in REANA.
Both workflows are defined as GIT submodules in this repository. Submodules allow us to combine contents from different repositories when contents of both are necessary to perform some complex operation, but they are, indeed, different projects.
The workflow specification is composed by 2 sub-workflows:
The combined workflow has this shape:
To install all the source code that is necessary to operate with this project:
git clone --recurse-submodules https://github.com/madminer-tool/madminer-workflow
For cases where the project has already been cloned:
git submodule update --init --recursive
The repositories defined as sub-modules will follow their own development pace. For cases where the sub-module repositories has been updated on GitHub, and want to propagate those changes to your local copy of the repositories:
git submodule update --remote
The MLFlow framework has been integrated with some steps of the workflow in order to keep track of runs initial set of parameters, set of results, and generated artifacts.
In order to locally deploy your own:
# Deploy local tracking server
mlflow server \
--host "0.0.0.0" \
--port 5000 \
--workers 2 \
--backend-store-uri "file:///tmp/mlflow/runs/metadata" \
--default-artifact-root "file:///tmp/mlflow/runs/artifacts"
# Specify server URL to interact with it
export MLFLOW_TRACKING_URI="http://0.0.0.0:5000"
# Create experiments to avoid race conditions on parallelized steps.
mlflow experiments create --experiment-name "madminer-ml-sample"
mlflow experiments create --experiment-name "madminer-ml-train"
mlflow experiments create --experiment-name "madminer-ml-eval"
The full workflow can be launched using Yadage. Yadage is a YAML specification language over a set of utilities that are used to coordinate workflows. Please consider that it can be hard to define Yadage workflows as the Yadage documentation is incomplete. For learning about Yadage hidden features contact Lukas Heinrich, Yadage creator.
Yadage execution depends on having both Docker environment images (physics and ML) already pushed. If they are not, please follow the instructions on the Madminer physics workflow and Madminer ML workflow repositories.
Once the Docker images are available on DockerHub, run locally:
export MLFLOW_TRACKING_URI="http://host.docker.internal:5000"
export PACKTIVITY_DOCKER_CMD_MOD="--add-host host.docker.internal:host-gateway" # Linux only
make yadage-run
To debug the workflow locally using REANA first install Docker
and the kind
CLI tool (Kubernetes in Docker) to deploy a local cluster.
Please follow the local deployment documentation to set up REANA.
To start the workflow:
$ source ~/.virtualenvs/reana/bin/activate
(reana) $ eval $(reana-dev client-setup-environment)
(reana) $ export REANA_WORKON=madminer-workflow
(reana) $ export MLFLOW_TRACKING_URI=http://host.docker.internal:5000
(reana) $ make reana-run
In case you have access to a remote REANA cluster and want to deploy there, you would need to set up the environment variables yourself:
$ source ~/.virtualenvs/reana/bin/activate
(reana) $ export REANA_ACCESS_TOKEN=[..]
(reana) $ export REANA_SERVER_URL=[..]
(reana) $ export REANA_WORKON=madminer-workflow
(reana) $ export MLFLOW_TRACKING_URI=<tracking_server_url>
(reana) $ make reana-run
It might take some time to finish depending on the job and the cluster. Once it does, list and download the files:
(reana) $ reana-client ls
(reana) $ reana-client download <path/to/file/on/reana/workon>