mlflow / mlflow

Open source platform for the machine learning lifecycle
https://mlflow.org
Apache License 2.0
18.33k stars 4.14k forks source link

[BUG] Running a MLflow project with docker_env fails to create the docker container. #2501

Open Grisly00 opened 4 years ago

Grisly00 commented 4 years ago

System information

Describe the problem

The example MLflow project (and my own aswell) using a docker_env and run with above command throws a docker error.

Expected behavior: Python file is executed and tracked and run is added in mlruns.

Actual behavior: docker throws an error

docker: Error response from daemon: invalid mode: \git_repos\mlflow_example\mlruns\0\e6763b1645214c54bb5d606e3be72170\artifacts.

The problem seems to be that mlflow tries passes a -v flag to docker to map a host directory to itself: _docker run --rm -v D:\git_repos\mlflow_example\mlruns:/mlflow/tmp/mlruns -v D:\git_repos\mlflow_example\mlruns\0\e6763b1645214c54bb5d606e3be72170\artifacts:D:\git_repos\mlflow_example\mlruns\0\e6763b1645214c54bb5d606e3be72170\artifacts -e MLFLOW_RUN_ID=e6763b1645214c54bb5d606e3be72170 -e MLFLOW_TRACKING_URI=file:///mlflow/tmp/mlruns -e MLFLOW_EXPERIMENTID=0 docker-example:93e3a50 python train.py --alpha 0.5 --l1-ratio 0.1' in run with ID 'e6763b1645214c54bb5d606e3be72170

Code to reproduce issue

Simply follow the instrucitons in [https://github.com/mlflow/mlflow/tree/master/examples/docker]

Other info / logs

(CGa_env) D:\git_repos\mlflow_example>mlflow run examples/docker -P alpha=0.5 2020/02/27 16:31:49 INFO mlflow.projects: === Building docker image docker-example:93e3a50 === 2020/02/27 16:31:49 INFO mlflow.projects: Temporary docker context file C:\Users\CC073~1.GAI\AppData\Local\Temp\tmpfp1uz6ee was not deleted. 2020/02/27 16:31:49 INFO mlflow.projects: === Created directory C:\Users\CC073~1.GAI\AppData\Local\Temp\tmp88nz0lmt for downloading remote URIs passed to arguments of type 'path' === 2020/02/27 16:31:49 INFO mlflow.projects: === Running command 'docker run --rm -v D:\git_repos\mlflow_example\mlruns:/mlflow/tmp/mlruns -v D:\git_repos\mlflow_example\mlruns\0\e6763b1645214c54bb5d606e3be72170\artifacts:D:\git_repos\mlflow_example\mlruns\0\e6763b1645214c54bb5d606e3be72170\artifacts -e MLFLOW_RUN_ID=e6763b1645214c54bb5d606e3be72170 -e MLFLOW_TRACKING_URI=file:///mlflow/tmp/mlruns -e MLFLOW_EXPERIMENT_ID=0 docker-example:93e3a50 python train.py --alpha 0.5 --l1-ratio 0.1' in run with ID 'e6763b1645214c54bb5d606e3be72170' === docker: Error response from daemon: invalid mode: \git_repos\mlflow_example\mlruns\0\e6763b1645214c54bb5d606e3be72170\artifacts. See 'docker run --help'. 2020/02/27 16:31:49 ERROR mlflow.cli: === Run (ID 'e6763b1645214c54bb5d606e3be72170') failed ===

AndreyBulezyuk commented 4 years ago

Same issue on Windows. 10 Pro 1903 Basic Example from the MLFlow Documentation does not work.

MLFlow Project with Docker as Environment fails when used with 'mlflow run .'

Exception: docker: Error response from daemon: invalid mode: \Users\andre\code\mlflow1\productionfirst\mlruns\0\c4132f95210546f787f89591b0e6d00e\artifacts.

MLProject

name: productionfirst

docker_env:
    image:  mlflow-docker-example

entry_points:
  main:
    command: "python classifier.py"

Dockerfile

FROM continuumio/miniconda:4.5.4

RUN pip install mlflow>=1.0 \
    && pip install numpy==1.14.3 \
    && pip install scipy \
    && pip install pandas==0.22.0 \
    && pip install scikit-learn==0.19.1 \
    && pip install cloudpickle \
    && pip install Keras \
    && pip install sklearn

Here is the cmd that is being executed:

020/03/03 14:15:32 INFO mlflow.projects: === Running command 'docker run --rm -v C:\Users\andre\code\mlflow1\productionfirst\mlruns:/mlflow/tmp/mlruns -v C:\Users\andre\code\mlflow1\productionfirst\mlruns\0\9fd34ea8e7ed4e289a0d3c1b1b826fd8\artifacts:C:\Users\andre\code\mlflow1\productionfirst\mlruns\0\9fd34ea8e7ed4e289a0d3c1b1b826fd8\artifacts -e MLFLOW_RUN_ID=9fd34ea8e7ed4e289a0d3c1b1b826fd8 -e MLFLOW_TRACKING_URI=file:///mlflow/tmp/mlruns -e MLFLOW_EXPERIMENT_ID=0 productionfirst:latest python3 classifier.py' in run with ID '9fd34ea8e7ed4e289a0d3c1b1b826fd8' ===
docker: Error response from daemon: invalid mode: \Users\andre\code\mlflow1\productionfirst\mlruns\0\9fd34ea8e7ed4e289a0d3c1b1b826fd8\artifacts.

docker run mounts two volumes

  1. C:\Users\andre\code\mlflow1\productionfirst\mlruns
  2. C:\Users\andre\code\mlflow1\productionfirst\mlruns\0\9fd34ea8e7ed4e289a0d3c1b1b826fd8\artifacts

The second mount is

Removing the second mount and executing the command manually in Command Shell solves the issue. Will try to code a fix and create a PR later.

Problem seems to lie in project/init.py line 654/655 and _get_local_artifact_cmd_and_envs line 804

artifact_cmds, artifact_envs = \
        _get_docker_artifact_storage_cmd_and_envs(active_run.info.artifact_uri)
AndreyBulezyuk commented 4 years ago

Working on a bugfix

daqieq commented 3 years ago

Same issue with fresh install of mlflow today while following the docker example from mlflow github repo.

jwa5426 commented 3 years ago

Still an issue for me as well. https://github.com/mlflow/mlflow/issues/1335#issuecomment-812686947

aymutlu commented 3 years ago

Same for me, still does not work as illustrated in MLflow documentation:

https://github.com/mlflow/mlflow/tree/master/examples/docker

FarhanAhmad4473 commented 2 years ago

I am also facing the issue. I am following the Docker example as written in MLflow documentation https://github.com/mlflow/mlflow/tree/master/examples/docker

And getting this error upon running the project: 2021/11/16 13:05:12 INFO mlflow.projects.docker: === Building docker image docker-example:d6ae841 === 2021/11/16 13:05:13 INFO mlflow.projects.docker: Temporary docker context file C:\Users\FARHAN~1\AppData\Local\Temp\tmp95rojz21 was not deleted. 2021/11/16 13:05:13 INFO mlflow.projects.utils: === Created directory C:\Users\FARHAN~1\AppData\Local\Temp\tmpr4ox7wqg for downloading remote URIs passed to arguments of type 'path' === 2021/11/16 13:05:13 INFO mlflow.projects.backend.local: === Running command 'docker run --rm -v \mlruns.db:/mlflow/tmp/mlruns -v C:\Coding\learning\mlops\MLOPs_with_MLFlow\mlflow\mlflow\mlruns\0\8b92b73848a549e08911fddc54d3c5cb\artifacts:\mlflow\projects\code\mlruns\0\8b92b73848a549e08911fddc54d3c5cb\artifacts -e MLFLOW_RUN_ID=8b92b73848a549e08911fddc54d3c5cb -e MLFLOW_TRACKING_URI=sqlite:///C:/mlflow/tmp/mlruns -e MLFLOW_EXPERIMENT_ID=0 docker-example:d6ae841 python train.py --alpha 0.5 --l1-ratio 0.1' in run with ID '8b92b73848a549e08911fddc54d3c5cb' === docker: Error response from daemon: \mlruns.db%!(EXTRA string=is not a valid Windows path). See 'docker run --help'. 2021/11/16 13:05:14 ERROR mlflow.cli: === Run (ID '8b92b73848a549e08911fddc54d3c5cb') failed ===

MichalNawrot commented 2 years ago

It's still an issue. I've tried it on both an example from Machine-Learning-Engineering-with-MLflow book & official MLflow-docker-example using mlflow run . & mlflow run . -P alpha=0.4 respectively (Windows 10, mlflow v1.24.0)

As AndreyBulezyuk mentioned removing the second mount from docker run command and executing it manually in Command Shell solves the issue but breaks the desired mlflow workflow.

Grisly00 commented 2 years ago

This issue is now open for over 2! years and is allegedly an easy fix. Is there any intention on solving this issue, or is there a different recommended approach to this?

karthickme commented 1 year ago

I'm also facing the same issue, why container_path is an absolute path? this will impact all windows users.

https://github.com/mlflow/mlflow/blob/7c25e4ddd36d209d22488cdce699419430d74205/mlflow/projects/backend/local.py#L325-L332

lennartvandeguchte commented 1 year ago

I'm still facing this issue with mlflow v2.6.0 on Windows 10. Does anyone know a workaround?

JINO-ROHIT commented 8 months ago

same, i have the mlflow version 2.9.2(latest) and i still face the error in windows, any workarounds/solutions? Thanks

JINO-ROHIT commented 7 months ago

@lennartvandeguchte heres what i did, i have windows 11, i installed wsl for ubuntu and now its all good. occasionally it gets a bit buggy but thats alright ig

mario-schiappacasse-ug commented 5 months ago

@lennartvandeguchte heres what i did, i have windows 11, i installed wsl for ubuntu and now its all good. occasionally it gets a bit buggy but thats alright ig

Could you elaborate more please? I'm still facing this issue. Thanks!

JINO-ROHIT commented 5 months ago

@mario-schiappacasse-ug what os are you using? and what trouble are you having?

mario-schiappacasse-ug commented 5 months ago

@JINO-ROHIT

I'm running windows 11. While trying to run mlflow run it gives the following error. docker: Error response from daemon: invalid mode: \Users\\project\mlruns\0\\artifacts. See 'docker run --help'.

In the MLproject i have defined a docker_env.

JINO-ROHIT commented 5 months ago

@mario-schiappacasse-ug hey the bug is that is doesnt work on windows, you basically have two choices -

  1. use wsl on windows which gives you a linux environment on a windows machine.
  2. dual boot and run this on ubuntu.
mario-schiappacasse-ug commented 5 months ago

@JINO-ROHIT Thanks! Will try!

I'm currently trying with devcontainer running in debian. But for some reason mlflow is creating a broken volume for the artifacts.