microsoft / AzureTRE

An accelerator to help organizations build Trusted Research Environments on Azure.
https://microsoft.github.io/AzureTRE
MIT License
185 stars 145 forks source link

Add MLflow support #1227

Closed daltskin closed 2 years ago

daltskin commented 2 years ago

Is your feature request related to a problem? Please describe. As a researcher I need to be able to track my ML jobs

Describe the solution you'd like To use MLflow within a workspace or shared service

mjbonifa commented 2 years ago

A minimal scenario of a team of analysts working collaboratively on a machine learning project would be

In the scenario:

Remarks:

docker-vm

marrobi commented 2 years ago

@daltskin @mjbonifa my thoughts:

mjbonifa commented 2 years ago

Can we use the local Jupyter instance on the per user VMs as a first stab to work with ML Flow?

Yes, simplify further to test this

Does the shared storage have to be a specific directory or is any mounted storage location sufficient? and How does ML flow know where to look?

mlflow is configured with a couple of parameters See MLFlow docs

e.g.

mlflow server --backend-store-uri sqlite:///mlruns.db --default-artifact-root /fs/mlruns

Do we need an ML Flow directory creating and specified in advance?

Yes, the storage will need to be mounted when the mlflow server starts and mlflow will need to be configured where to write artifacts

We will need an ML Flow workspace service story

The MLFlow is per workspace and not per TRE. Please elaborate what you need.

Other considerations

MLFlow has a user interface so will need consider how it is accessed via Gucamole too.

marrobi commented 2 years ago

@mjbonifa that's useful, all good. Don't worry about the final two points.

As part of the ML Flow workspace service will need to create a directory within the shared storage share so users know where to write artifacts.

CalMac-tns commented 2 years ago

We have looked at the the requirements and expanded them further:

  1. An instance of MLFlow will be required per workspace.
  2. MLFlow must be accessible from all VM's within a workspace.
  3. The instance of MLFlow should be available on either Linux or Windows VM. a. Quickstart — MLflow 1.23.1 documentation b. Use "mlflow ui" to start c. MLFlow should load with http://localhost:5000/. Need to confirm that scenario 3 is the preferred option to install MLFlow (Requires shared storage)
mjbonifa commented 2 years ago
  1. Agreed
  2. Agreed
  3. The mlflow server command starts a service with UI at default http://localhost:5000/. Don't need mlflow ui. For the MLFlow tracking server it makes no difference if deployed on Linux or Windows from a analysts perspective as they access the mlflow UI through a browser or the MLflow python client API.
daltskin commented 2 years ago

@mjbonifa @CalMac-tns I've split out #1290 so initial work can begin - does the acceptance criteria look ok to you?

marrobi commented 2 years ago

Work is complete, PR is merged.