mit-han-lab / torchsparse

[MICRO'23, MLSys'22] TorchSparse: Efficient Training and Inference Framework for Sparse Convolution on GPUs.
https://torchsparse.mit.edu
MIT License
1.16k stars 132 forks source link

[Installation] Installing inside Docker container #228

Closed aldipiroli closed 11 months ago

aldipiroli commented 11 months ago

Is there an existing issue for this?

Have you followed all the steps in the FAQ?

Current Behavior

I am trying to install torchsparse v2.1.0 inside a docker container, specifically the NVIDIA pytorch containers.

Unfortunately, so I was not successful in installing it. However, I believe it might be useful also to other people to have an official docker container with torchsparse installed.

Here is what I tried so far.

  1. Installing from soruce: The Dockerfile:

    
        FROM nvcr.io/nvidia/pytorch:23.02-py3 
        ENV DEBIAN_FRONTEND=noninteractive
    
        # ======================================================================
        # Update and Install Tools
        # ======================================================================
        RUN apt-get update -y \
            && apt-get install build-essential \
            && apt-get install -y apt-transport-https gnupg software-properties-common meld vim ninja-build libboost-dev sudo nvtop \ 
            && apt-get install -y apt-utils git gitk curl ca-certificates bzip2 tree htop wget zsh\
            && apt-get install -y openexr libopenexr-dev  \
            && apt-get install -y libsparsehash-dev \ 
            && rm -rf /var/lib/apt/lists/
    
        # ======================================================================
        # Install torchsparse
        # ======================================================================
        RUN pip install --upgrade pip 
        RUN git clone https://github.com/mit-han-lab/torchsparse.git \
            && cd torchsparse \
            && pip install --no-cache-dir -r requirements.txt \
            && FORCE_CUDA=1 pip install . 
    
    • Build using docker build -t torchsparse .
    • This yields the following error
                 ......
                #0 212.0           self.run_command(cmd)
                #0 212.0         File "/usr/local/lib/python3.8/dist-packages/setuptools/dist.py", line 1217, in run_command
                #0 212.0           super().run_command(command)
                #0 212.0         File "/usr/lib/python3.8/distutils/dist.py", line 985, in run_command
                #0 212.0           cmd_obj.run()
                #0 212.0         File "/usr/local/lib/python3.8/dist-packages/wheel/bdist_wheel.py", line 325, in run
                #0 212.0           self.run_command("build")
                #0 212.0         File "/usr/lib/python3.8/distutils/cmd.py", line 313, in run_command
                #0 212.0           self.distribution.run_command(command)
                #0 212.0         File "/usr/local/lib/python3.8/dist-packages/setuptools/dist.py", line 1217, in run_command
                #0 212.0           super().run_command(command)
                #0 212.0         File "/usr/lib/python3.8/distutils/dist.py", line 985, in run_command
                #0 212.0           cmd_obj.run()
                #0 212.0         File "/usr/lib/python3.8/distutils/command/build.py", line 135, in run
                #0 212.0           self.run_command(cmd_name)
                #0 212.0         File "/usr/lib/python3.8/distutils/cmd.py", line 313, in run_command
                #0 212.0           self.distribution.run_command(command)
                #0 212.0         File "/usr/local/lib/python3.8/dist-packages/setuptools/dist.py", line 1217, in run_command
                #0 212.0           super().run_command(command)
                #0 212.0         File "/usr/lib/python3.8/distutils/dist.py", line 985, in run_command
                #0 212.0           cmd_obj.run()
                #0 212.0         File "/usr/local/lib/python3.8/dist-packages/setuptools/command/build_ext.py", line 84, in run
                #0 212.0           _build_ext.run(self)
                #0 212.0         File "/usr/local/lib/python3.8/dist-packages/Cython/Distutils/old_build_ext.py", line 186, in run
                #0 212.0           _build_ext.build_ext.run(self)
                #0 212.0         File "/usr/lib/python3.8/distutils/command/build_ext.py", line 340, in run
                #0 212.0           self.build_extensions()
                #0 212.0         File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 845, in build_extensions
                #0 212.0           build_ext.build_extensions(self)
                #0 212.0         File "/usr/local/lib/python3.8/dist-packages/Cython/Distutils/old_build_ext.py", line 195, in build_extensions
                #0 212.0           _build_ext.build_ext.build_extensions(self)
                #0 212.0         File "/usr/lib/python3.8/distutils/command/build_ext.py", line 449, in build_extensions
                #0 212.0           self._build_extensions_serial()
                #0 212.0         File "/usr/lib/python3.8/distutils/command/build_ext.py", line 474, in _build_extensions_serial
                #0 212.0           self.build_extension(ext)
                #0 212.0         File "/usr/local/lib/python3.8/dist-packages/setuptools/command/build_ext.py", line 246, in build_extension
                #0 212.0           _build_ext.build_extension(self, ext)
                #0 212.0         File "/usr/lib/python3.8/distutils/command/build_ext.py", line 528, in build_extension
                #0 212.0           objects = self.compiler.compile(sources,
                #0 212.0         File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 660, in unix_wrap_ninja_compile
                #0 212.0           _write_ninja_file_and_compile_objects(
                #0 212.0         File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1571, in _write_ninja_file_and_compile_objects
                #0 212.0           _run_ninja_build(
                #0 212.0         File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1906, in _run_ninja_build
                #0 212.0           raise RuntimeError(message) from e
                #0 212.0       RuntimeError: Error compiling objects for extension
                #0 212.0       [end of output]
                #0 212.0   
                #0 212.0   note: This error originates from a subprocess, and is likely not a problem with pip.
                #0 212.0   ERROR: Failed building wheel for torchsparse
                #0 212.0   Running setup.py clean for torchsparse
                #0 213.7 Failed to build torchsparse
                #0 213.7 ERROR: Could not build wheels for torchsparse, which is required to install pyproject.toml-based projects
                ------
                Dockerfile:19
                --------------------
                  18 |     RUN pip install --upgrade pip 
                  19 | >>> RUN git clone https://github.com/mit-han-lab/torchsparse.git \
                  20 | >>>     && cd torchsparse \
                  21 | >>>     && pip install --no-cache-dir -r requirements.txt \
                  22 | >>>     && FORCE_CUDA=1 pip install . 
                  23 |     
                --------------------
                ERROR: failed to solve: process "/bin/sh -c git clone https://github.com/mit-han-lab/torchsparse.git     && cd torchsparse     && pip install --no-cache-dir -r requirements.txt     && FORCE_CUDA=1 pip install ." did not complete successfully: exit code: 1
  2. Installing from the pre-build wheels:

    • The Dockerfile, using nvcr.io/nvidia/pytorch:22.08-py3, which has pytorch==1.13 and CUDA 11.7.1:

      
      FROM nvcr.io/nvidia/pytorch:22.08-py3 
      ENV DEBIAN_FRONTEND=noninteractive
      
      # ======================================================================
      # Update and Install Tools
      # ======================================================================
      RUN apt-get update -y \
          && apt-get install build-essential \
          && apt-get install -y apt-transport-https gnupg software-properties-common meld vim ninja-build libboost-dev sudo nvtop \ 
          && apt-get install -y apt-utils git gitk curl ca-certificates bzip2 tree htop wget zsh\
          && apt-get install -y openexr libopenexr-dev  \
          && apt-get install -y libsparsehash-dev \ 
          && rm -rf /var/lib/apt/lists/
      
      # ======================================================================
      # Install torchsparse
      # ======================================================================
      RUN pip install --upgrade pip 
      RUN FORCE_CUDA=1 pip install --extra-index-url http://24.199.104.228/simple --trusted-host 24.199.104.228 torchsparse==2.1.0+torch113cu117 --force-reinstall
      
      • Build using docker build -t torchsparse .
      • Here the package is installed, however when we run the docker container I get the following error:
      Python 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:04:10) 
      [GCC 10.3.0] on linux
      Type "help", "copyright", "credits" or "license" for more information.
      >>> import torchsparse
      Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "torchsparse/__init__.pyx", line 4, in init torchsparse.__init__
      File "torchsparse/operators.pyx", line 6, in init torchsparse.operators
      File "torchsparse/tensor.pyx", line 6, in init torchsparse.tensor
      File "torchsparse/utils/__init__.pyx", line 3, in init torchsparse.utils.__init__
      File "torchsparse/utils/to_dense.pyx", line 7, in init torchsparse.utils.to_dense
      ImportError: /opt/conda/lib/python3.8/site-packages/torchsparse/backend.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZN2at4_ops5zeros4callEN3c108ArrayRefINS2_6SymIntEEENS2_8optionalINS2_10ScalarTypeEEENS6_INS2_6LayoutEEENS6_INS2_6DeviceEEENS6_IbEE
      • This is probably because the wheel that I installed was not compiled with the correct torch or cuda version.
      • I the same undefined symbol error when using the installation script python -c "$(curl -fsSL https://raw.githubusercontent.com/mit-han-lab/torchsparse/master/install.py)"
      • I have tried different nvidia containers with different torch and cuda versions e.g., torch2.1cu12.1, torch2.1cu12.0, torch1.13cu11.7, ... . All yield to similar errors.

Do you have any suggestion on how I can proceed?

Error Line

First method:

                  #0 213.7 Failed to build torchsparse
                  #0 213.7 ERROR: Could not build wheels for torchsparse, which is required to install pyproject.toml-based projects
                  ------
                  Dockerfile:19
                  --------------------
                    18 |     RUN pip install --upgrade pip 
                    19 | >>> RUN git clone https://github.com/mit-han-lab/torchsparse.git \
                    20 | >>>     && cd torchsparse \
                    21 | >>>     && pip install --no-cache-dir -r requirements.txt \
                    22 | >>>     && FORCE_CUDA=1 pip install . 
                    23 |     
                  --------------------
                  ERROR: failed to solve: process "/bin/sh -c git clone https://github.com/mit-han-lab/torchsparse.git     && cd torchsparse     && pip install --no-cache-dir -r requirements.txt     && FORCE_CUDA=1 pip install ." did not complete successfully: exit code: 1

Second method:


      Python 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:04:10) 
      [GCC 10.3.0] on linux
      Type "help", "copyright", "credits" or "license" for more information.
      >>> import torchsparse
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
        File "torchsparse/__init__.pyx", line 4, in init torchsparse.__init__
        File "torchsparse/operators.pyx", line 6, in init torchsparse.operators
        File "torchsparse/tensor.pyx", line 6, in init torchsparse.tensor
        File "torchsparse/utils/__init__.pyx", line 3, in init torchsparse.utils.__init__
        File "torchsparse/utils/to_dense.pyx", line 7, in init torchsparse.utils.to_dense
      ImportError: /opt/conda/lib/python3.8/site-packages/torchsparse/backend.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZN2at4_ops5zeros4callEN3c108ArrayRefINS2_6SymIntEEENS2_8optionalINS2_10ScalarTypeEEENS6_INS2_6LayoutEEENS6_INS2_6DeviceEEENS6_IbEE

Environment

- GCC:
- NVCC:
- PyTorch:
- PyTorch CUDA:

Full Error Log

Error Log [PUT YOUR ERROR LOG HERE]
ys-2020 commented 11 months ago

Hi @aldipiroli , thanks for your interest and efforts in integrating TorchSparse to docker containers.

For the first method (building from source), since the source code of TorchSparse 2.1.0 is currently undergoing internal quality checks, it is not available for use yet. The code you are using is not v2.1.0.

For the second method, the undefined symbol error is likely caused by using a Pytorch version that is not in our support matrix.

One possible solution is to reinstall PyTorch through pip install torch==2.0.1 --force-reinstall in your docker container and then use python -c "$(curl -fsSL https://raw.githubusercontent.com/mit-han-lab/torchsparse/master/install.py)" to install torchsparse.

Alternatively, you can try pulling a different PyTorch docker image to solve the problem. Below is a sample dockerfile that you can try:

  FROM pytorch/pytorch:latest
  ENV DEBIAN_FRONTEND=noninteractive

  # ======================================================================
  # Update and Install Tools
  # ======================================================================
  RUN apt-get update -y \
      && apt-get install -y build-essential \
      && apt-get install -y apt-transport-https gnupg software-properties-common meld vim ninja-build libboost-dev sudo nvtop \ 
      && apt-get install -y apt-utils git gitk curl ca-certificates bzip2 tree htop wget zsh\
      && apt-get install -y openexr libopenexr-dev  \
      && apt-get install -y libsparsehash-dev \ 
      && rm -rf /var/lib/apt/lists/

  # ======================================================================
  # Install torchsparse
  # ======================================================================
  RUN pip install --upgrade pip 
  RUN FORCE_CUDA=1 python -c "$(curl -fsSL https://raw.githubusercontent.com/mit-han-lab/torchsparse/master/install.py)"
aldipiroli commented 11 months ago

@ys-2020 thank you very much for your answer! The minimal docker example works well. When final code is published I'll try again to install it in the NVIDIA docker, and if needed open a PR.