microsoft / MLOpsPython

MLOps using Azure ML Services and Azure DevOps
MIT License
1.2k stars 1.1k forks source link

(Major?) Bug with mcr.microsoft.com/mlops/python:latest and Azure Machine Learning extension in DevOps #329

Closed kodonnell closed 4 years ago

kodonnell commented 4 years ago

TLDR; I believe this will prevent anyone from using MLOps inside of Azure DevOps.

We're running an MLOps workshops (in partnership with Microsoft) with customers, and things started failing at the Azure ML Model Deploy step.

/usr/local/envs/mlopspython_ci/bin/az ml model deploy -n mlops-aci --model oilwells_model.pkl:4 --ic /__w/1/s/oilwells/scoring/inference_config.yml --dc /__w/1/s/oilwells/scoring/deployment_config_aci.yml -g MLOps-2020-09-22-team-02-prod -w mlops-AML-WS --overwrite
The command failed with an unexpected error. Here is the traceback:

cannot import name 'PROFILE_METADATA_CPU_KEY' from 'azureml._model_management._constants' (/usr/local/envs/mlopspython_ci/lib/python3.7/site-packages/azureml/_model_management/_constants.py)
    op = import_module(mod_to_import)
  File "/usr/local/envs/mlopspython_ci/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 728, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/home/AzDevOps_azpcontainer/.azure/cliextensions/azure-cli-ml/azext_ml/model.py", line 17, in <module>
    from azureml._model_management._constants import ACI_WEBSERVICE_TYPE, AKS_ENDPOINT_TYPE, AKS_WEBSERVICE_TYPE, \
ImportError: cannot import name 'PROFILE_METADATA_CPU_KEY' from 'azureml._model_management._constants' (/usr/local/envs/mlopspython_ci/lib/python3.7/site-packages/azureml/_model_management/_constants.py)

We've traced the issue somewhat to mcr.microsoft.com/mlops/python:latest - there are no versions on docker hub, so we had to go with latest (which we pulled about 30 minutes before this message), and if you look at /usr/local/envs/mlopspython_ci/lib/python3.7/site-packages/azureml/_model_management/_constants.py there is no PROFILE_METADATA_CPU_KEY (though there is PROFILE_RECOMMENDED_CPU_KEY which seems suspicious). I can't find /home/AzDevOps_azpcontainer/.azure/cliextensions/azure-cli-ml/azext_ml/model.py, but I found in the DevOps logs that this was created eventually via /usr/local/envs/mlopspython_ci/bin/az extension add -n azure-cli-ml which installs version 1.14.0 of the extension. If I check out that file, then (removing cruft) ...

from azureml._model_management._constants import ... PROFILE_METADATA_CPU_KEY ...

So it definitely seems to be an error. My guess is that it's a version mismatch somewhere, and if we have time tomorrow, that's what we'll be digging into - and, if there's no hotfix, we'll have to build our own docker container on top of mcr.microsoft.com/mlops/python:latest where we fix any issues ... which would be painful. (Hint hint to anyone else who comes along this and wants to help out!)

The second day of the workshop is tomorrow, so needless to say, this is somewhat urgent.

Other info:

kodonnell commented 4 years ago

OK, probable fix is upgrading azureml-sdk to 1.14.0 in the docker container. (Untested, but I suspect pip install --upgrade --upgrade-strategy eager azureml-sdk will do all that's needed.) I don't know why this isn't done automatically as part of /usr/local/envs/mlopspython_ci/bin/az extension add -n azure-cli-ml, but that seems to be the cause.

cbertolasio commented 4 years ago

I just encountered this issue today.

j-so commented 4 years ago

We've hit this issue a few times now. I am making a PR to use the latest version of the azureml-sdk. During version updates, we can run the docker image creation to update the version in the docker image.

Mitigation - run pip install --upgrade azureml-sdk as a step at the beginning of your pipeline.

j-so commented 4 years ago

I have updated the public docker image and updated the ci_dependencies. Please run again and reopen if you're still seeing the issue!

cbertolasio commented 4 years ago

I removed the pip install commands from my pipeline and re-ran the pipeline a couple of times and the previous error went away. Thanks