motional / nuplan-devkit

The devkit of the nuPlan dataset.
https://www.nuplan.org
Other
673 stars 129 forks source link

pip failed when creating docker image #246

Closed Shi-Qi-Li closed 1 year ago

Shi-Qi-Li commented 1 year ago

Hi developers, I met an error while following the Submission Tutorial to create the docker. I don't know how to solve it.

$ docker build --network host -f Dockerfile.submission . -t nuplan-evalservice-server:test.contestant
Sending build context to Docker daemon   7.88MB
Step 1/20 : FROM ubuntu:20.04
 ---> a0ce5a295b63
Step 2/20 : RUN apt-get update     && apt-get install -y curl gnupg2 software-properties-common default-jdk
 ---> Using cache
 ---> 0a73e949bb03
Step 3/20 : ENV APT_KEY_DONT_WARN_ON_DANGEROUS_USAGE=DontWarn
 ---> Using cache
 ---> 11c2934d17e6
Step 4/20 : RUN curl -fsSL https://bazel.build/bazel-release.pub.gpg | apt-key add -     && curl -fsSL https://nvidia.github.io/nvidia-docker/gpgkey | apt-key add -     && curl -s -L https://nvidia.github.io/nvidia-docker/ubuntu20.04/nvidia-docker.list | tee /etc/apt/sources.list.d/nvidia-docker.list     && add-apt-repository "deb [arch=amd64] https://storage.googleapis.com/bazel-apt stable jdk1.8"     && apt-get update     && apt-get install -y         bazel         file         zip         nvidia-container-toolkit         software-properties-common     && rm -rf /var/lib/apt/lists/*
 ---> Using cache
 ---> 1314e0b0c0ab
Step 5/20 : ENV PATH /opt/conda/bin:$PATH
 ---> Using cache
 ---> 7e7915fe06ef
Step 6/20 : RUN curl -fsSLo Miniconda3-latest-Linux-x86_64.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh &&     bash Miniconda3-latest-Linux-x86_64.sh -b -p /opt/conda &&     rm Miniconda3-latest-Linux-x86_64.sh &&     conda clean -a -y
 ---> Using cache
 ---> a67db9d657d3
Step 7/20 : ARG NUPLAN_HOME=/nuplan_devkit
 ---> Using cache
 ---> 2c8bbaa6cef7
Step 8/20 : WORKDIR $NUPLAN_HOME
 ---> Using cache
 ---> 757d75284f21
Step 9/20 : COPY requirements.txt requirements_torch.txt requirements_submission.txt environment_submission.yml /nuplan_devkit/
 ---> 08f2eabf6ec8
Step 10/20 : RUN python --version
 ---> Running in cf94c8f21d49
Python 3.10.9
Removing intermediate container cf94c8f21d49
 ---> 62623bdfeac6
Step 11/20 : RUN conda env create -f $NUPLAN_HOME/environment_submission.yml
 ---> Running in d4dd1491e283
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... done

Downloading and Extracting Packages

Preparing transaction: ...working... done            
Verifying transaction: ...working... done            
Executing transaction: ...working... done            
Installing pip dependencies: ...working... Ran pip subprocess with arguments:
['/opt/conda/envs/nuplan/bin/python', '-m', 'pip', 'install', '-U', '-r', '/nuplan_devkit/condaenv.ru1hh1v6.requirements.txt', '--exists-action=b']
Pip subprocess output:                               
Looking in links: https://download.pytorch.org/whl/torch_stable.html, https://data.pyg.org/whl/torch-1.9.0+cu111.html
Ignoring torch: markers 'platform_system == "Darwin"' don't match your environment

failed                                               
Pip subprocess error:                                
WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='download.pytorch.org', port=443): Read timed out. (read timeout=15)")': /whl/torch_stable.html  
WARNING: Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='download.pytorch.org', port=443): Read timed out. (read timeout=15)")': /whl/torch_stable.html  
ERROR: Could not find a version that satisfies the requirement torch==1.9.0+cu111 (from versions: 1.7.1, 1.8.0, 1.8.1, 1.9.0, 1.9.1, 1.10.0, 1.10.1, 1.10.2, 1.11.0, 1.12.0, 1.12.1, 1.13.0, 1.13.1, 2.0.0)
ERROR: No matching distribution found for torch==1.9.0+cu111

CondaEnvException: Pip failed

The command '/bin/sh -c conda env create -f $NUPLAN_HOME/environment_submission.yml' returned a non-zero code: 1

I print the python version in Dockerfile.submission(upper log step 10), and it's 3.10.9. I'm not sure if it's correct.

gianmarco-motional commented 1 year ago

The python version you are printing is not the same one specified in the conda environment. To check if the right version is used you can remove the pip dependencies ( so your install won't fail), having environment_submission.yml like:

name: nuplan
channels:
  - conda-forge
dependencies:
  - python=3.9
  - pip=21.2.4

Then run your python --version command after the SHELL line

SHELL ["conda", "run", "-n", "nuplan", "/bin/bash", "-c"]

in the Dockerfile.submisison.

Shi-Qi-Li commented 1 year ago

Thanks for your instant reply. I removed the pip dependencies and got output 3.9.6 after the SHELL line. I think the python version is right, are there any other factors that could cause the torch installation fail?

JingyuQian commented 1 year ago

@Shi-Qi-Li If you followed the tutorial, it is likely your docker environment is not cuda-enabled.

If that is the case, follow https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#setting-up-nvidia-container-toolkit to install nvidia-container-toolkit. That should solve your problem.

Shi-Qi-Li commented 1 year ago

@Shi-Qi-Li If you followed the tutorial, it is likely your docker environment is not cuda-enabled.

If that is the case, follow https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#setting-up-nvidia-container-toolkit to install nvidia-container-toolkit. That should solve your problem.

Thanks for your reply! I installed the nvidia-container-toolkit but the problem persists. And I solved the problem by downloading and installing from the .whl file.

bulbcult commented 1 year ago

Thanks for your reply! I installed the nvidia-container-toolkit but the problem persists. And I solved the problem by downloading and installing from the .whl file.

I am wondering how much time did you spend on 'pip install'? I have spent hours for it and finally get a 'pip fail' https://github.com/motional/nuplan-devkit/issues/286#issue-1695435219

Shi-Qi-Li commented 1 year ago

Thanks for your reply! I installed the nvidia-container-toolkit but the problem persists. And I solved the problem by downloading and installing from the .whl file.

I am wondering how much time did you spend on 'pip install'? I have spent hours for it and finally get a 'pip fail' #286 (comment)

Hi @bulbcult , it takes me about 1-2 hours.