machine-learning-apps / ml-template-azure

Template for getting started with automated ML Ops on Azure Machine Learning
MIT License
126 stars 87 forks source link

Submit training run fails while installing package 'conda-forge::cycler-0.11.0-pyhd8ed1ab_0'. #18

Closed RajdeepBiswas closed 2 years ago

RajdeepBiswas commented 2 years ago

From the logs:

ERROR conda.core.link:_execute(502): An error occurred while installing package 'conda-forge::cycler-0.11.0-pyhd8ed1ab_0'. FileNotFoundError(2, "No such file or directory: '/azureml-envs/azureml_1767a447844ba221bf5dc7377e1011f3/bin/python3.6'") Attempting to roll back.

done

FileNotFoundError(2, "No such file or directory: '/azureml-envs/azureml_1767a447844ba221bf5dc7377e1011f3/bin/python3.6'")

marvinbuss commented 2 years ago

Hi @RajdeepBiswas, Is this related to the package that you are trying to install? Did the installation succeed before? Which version of the GH Action are you using?

rarora1979 commented 2 years ago

Followed all the steps given in the video - http://www.youtube.com/watch?v=bmFr0LYo_6o

getting the error -

 ERROR conda.core.link:_execute(502): An error occurred while installing package 'conda-forge::cycler-0.11.0-pyhd8ed1ab_0'. FileNotFoundError(2, "No such file or directory: '/azureml-envs/azureml_1767a447844ba221bf5dc7377e1011f3/bin/python3.6'") Attempting to roll back.

image

excerpt from the logs - Executing transaction: ...working... failed  ERROR conda.core.link:_execute(502): An error occurred while installing package 'conda-forge::cycler-0.11.0-pyhd8ed1ab_0'. FileNotFoundError(2, "No such file or directory: '/azureml-envs/azureml_1767a447844ba221bf5dc7377e1011f3/bin/python3.6'") Attempting to roll back.

Rolling back transaction: ...working... done  FileNotFoundError(2, "No such file or directory: '/azureml-envs/azureml_1767a447844ba221bf5dc7377e1011f3/bin/python3.6'")

The command '/bin/sh -c ldconfig /usr/local/cuda/lib64/stubs && conda env create -p /azureml-envs/azureml_1767a447844ba221bf5dc7377e1011f3 -f azureml-environment-setup/mutated_conda_dependencies.yml && rm -rf "$HOME/.cache/pip" && conda clean -aqy && CONDA_ROOT_DIR=$(conda info --root) && rm -rf "$CONDA_ROOT_DIR/pkgs" && find "$CONDA_ROOT_DIR" -type d -name pycache -exec rm -rf {} + && ldconfig' returned a non-zero code: 1 2021/11/22 19:00:05 Container failed during run: acb_step_0. No retries remaining. failed to run step ID: acb_step_0: exit status 1

Run ID: ca1 failed after 1h7m56s. Error: failed during run, err: exit status 1

malikamalik commented 2 years ago

I am getting the same error. Did you find a fix, @rarora1979

marvinbuss commented 2 years ago

Quick update: this is due to the environment specifications. Updating the python packages is required. I already have done this here and I will merge this back into the main repo later this week. https://github.com/malikamalik/quicktest