Docker build taking forever on CUDA step

ksskatka commented 4 months ago

Hi, I'm building a docker image using the suggested "docker build -t opensplat ."

The wget -nv https://developer.download.nvidia.com/compute/cuda/repos/${OS}/x86_64/cuda-${OS}.pin in Linux.sh is taking extremely long to execute with no signs of finishing. I am building this on linux.

ksskatka commented 4 months ago

For more context, after adding a -y flag to get past the wget call, the cmake build gives this error:

[8/8] RUN source .github/workflows/cuda/Linux-env.sh cu"12"$(echo 12.1.1 | cut -d'.' -f2) && mkdir build && cd build && cmake .. -GNinja -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH=/code/libtorch -DCMAKE_INSTALL_PREFIX=/code/install -DCMAKE_CUDA_ARCHITECTURES="70;75;80" -DCUDA_TOOLKIT_ROOT_DIR=${CUDA_HOME} && ninja: ... 28.20 -- Looking for pthread.h 28.25 -- Looking for pthread.h - found 28.25 -- Performing Test CMAKE_HAVE_LIBC_PTHREAD 28.30 -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success 28.30 -- Found Threads: TRUE
51.15 -- GLM: Version 1.0.1 51.15 -- GLM: Build with C++ features auto detection 51.17 CMake Warning at CMakeLists.txt:68 (message): 51.17 CUDA toolkit not found, building with CPU support only ... 51.19 CMake Error at libtorch/share/cmake/Caffe2/Caffe2Config.cmake:91 (message): 51.19 Your installed Caffe2 version uses CUDA but I cannot find the CUDA 51.19 libraries. Please set the proper CUDA prefixes and / or install CUDA. 51.19 Call Stack (most recent call first): 51.19 libtorch/share/cmake/Torch/TorchConfig.cmake:68 (find_package) 51.19 CMakeLists.txt:164 (find_package) 51.19 51.19 51.19 -- Configuring incomplete, errors occurred! 51.19 See also "/code/build/CMakeFiles/CMakeOutput.log".

pfxuan commented 4 months ago

It seems that wget does not support the -y flag, which likely caused the CUDA installation to fail and led to the subsequent compilation error.

The slow process isn't related to wget; instead, apt-get needs to download a large number of CUDA development libraries from the Ubuntu repository. The following commands will take some time to complete, depending on your network bandwidth:

sudo apt-get -qq update
sudo apt install -y cuda-nvcc-${CUDA/./-} cuda-libraries-dev-${CUDA/./-} cuda-command-line-tools-${CUDA/./-}
sudo apt clean

Also, you can use this docker build log as a reference to determine your current blocking issue.

ksskatka commented 4 months ago

Thanks for the response, I ended up removing the -nv flag and saw that it really is just a long process, 82 minutes to be exact. Not sure why this would take so long, other container with pytorch with CUDA have taken way shorter times. I'll let this run, if all is good then I'll shoot an update and close the issue.

ksskatka commented 4 months ago

All's good, thanks for the help!

pierotofy / OpenSplat

Docker build taking forever on CUDA step #111