Mismatch between cuda version in documentation and actual cuda in environment file (and possible solution to that)

ishipachev commented 11 months ago

Before I've figured it out I had issues with building submodules with pip. Windows 10. Using environment.yaml file, doing all steps as they were written in README.

I've tried to understand what was the problem and figured out that environment.yml file constains - cudatoolkit=11.6 But in README.md you can actually have it written as CUDA SDK 11 for PyTorch extensions, install after Visual Studio (we used 11.8, known issues with 11.6) It feels like it recommends to install 11.8, but actual dependency is different. And you may end up having CUDA 11.8 installed globally but cudatoolkit=11.6 inside conda environment.

Another problem here is that you can't just have cudatoolkit=11.8 installed inside conda environment. Here in the list of packages you just don't have it build next to the right pytorch version.

My solution to get cuda=11.7 which is available for pytorch versions < 2.0 (not like cuda 11.8) have it build was installing all dependencies step by step using this command: conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia After installing cuda 11.7 on the system level. (remember, Visual Studio has to be installed already).

Don't forget to check if you actually have a properly installed pytorch with cuda:

(gs) c:\dev\gaussian-splatting>python
>>> import torch
>>> torch.cuda.is_available()
True

ishipachev commented 11 months ago

Yep, after intensive tests it seems fully operational based on clean cuda installs. All pip-related cuda-related installation issues are gone now even though I went through all issues I've referenced here.

henrypearce4D commented 11 months ago

I had a question on this, and this comment explains more on the documentation and may or may not be useful information to you; https://github.com/graphdeco-inria/gaussian-splatting/issues/45#issuecomment-1643897347

ishipachev commented 11 months ago

@henrypearce4D

Thank you.

It seems it adds a volume to this problem. Honestly, I wasn't able to get anyway with CUDA 11.8 SDK and CUDA 11.6 in requirmements for python. All my issues got resolved only when I've got this 11.7 and 11.7 installation. They may be caused by other installation issues like "do reboot after install" etc and the right order of this things. That is why I wrote to all threads with errors I've got to check this out and see if it gives any results at the end.

It may be Windows specific problem where some libraries are picked by Visual Studio complier in a different order than on Linux machines. I may suspect that having two different versions in conda environment and globally installed full SDK may lead some confusion when pytorch/custom-code looks for headers/libs.

H-tr commented 11 months ago

Hi, if there are sufficient storage, we can actually create a dev container following the nerfstudio

FROM nvidia/cuda:11.8.0-devel-ubuntu22.04

ENV PYTHON_VERSION=3.7.0

ENV PATH="/root/miniconda3/bin:${PATH}"
ARG PATH="/root/miniconda3/bin:${PATH}"
ARG CUDA_ARCHITECTURES=90;89;86;80;75;70;61;52;37

# Install required apt packages and clear cache afterwards.
RUN apt-get update && \
    apt-get install -y --no-install-recommends \
    build-essential \
    cmake \
    curl \
    ffmpeg \
    git \
    libatlas-base-dev \
    libboost-filesystem-dev \
    libboost-graph-dev \
    libboost-program-options-dev \
    libboost-system-dev \
    libboost-test-dev \
    libhdf5-dev \
    libcgal-dev \
    libeigen3-dev \
    libflann-dev \
    libfreeimage-dev \
    libgflags-dev \
    libglew-dev \
    libgoogle-glog-dev \
    libmetis-dev \
    libprotobuf-dev \
    libqt5opengl5-dev \
    libsqlite3-dev \
    libsuitesparse-dev \
    nano \
    protobuf-compiler \
    python-is-python3 \
    python3.10-dev \
    python3-pip \
    qtbase5-dev \
    sudo \
    vim-tiny \
    wget && \
    rm -rf /var/lib/apt/lists/*

RUN wget \
    https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh \
    && mkdir /root/.conda \
    && bash Miniconda3-latest-Linux-x86_64.sh -b \
    && rm -f Miniconda3-latest-Linux-x86_64.sh 

# Install GLOG (required by ceres).
RUN git clone --branch v0.6.0 https://github.com/google/glog.git --single-branch && \
    cd glog && \
    mkdir build && \
    cd build && \
    cmake .. && \
    make -j `nproc` && \
    make install && \
    cd ../.. && \
    rm -rf glog
# Add glog path to LD_LIBRARY_PATH.
ENV LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:/usr/local/lib"

# Install Ceres-solver (required by colmap).
RUN git clone --branch 2.1.0 https://ceres-solver.googlesource.com/ceres-solver.git --single-branch && \
    cd ceres-solver && \
    git checkout $(git describe --tags) && \
    mkdir build && \
    cd build && \
    cmake .. -DBUILD_TESTING=OFF -DBUILD_EXAMPLES=OFF && \
    make -j `nproc` && \
    make install && \
    cd ../.. && \
    rm -rf ceres-solver

# Install colmap.
RUN git clone --branch 3.8 https://github.com/colmap/colmap.git --single-branch && \
    cd colmap && \
    mkdir build && \
    cd build && \
    cmake .. -DCUDA_ENABLED=ON \
             -DCMAKE_CUDA_ARCHITECTURES=${CUDA_ARCHITECTURES} && \
    make -j `nproc` && \
    make install && \
    cd ../.. && \
    rm -rf colmap

This environment works well for me. I can open a pr if there are any demands.

grosshill commented 4 months ago

Before I've figured it out I had issues with building submodules with pip. Windows 10. Using environment.yaml file, doing all steps as they were written in README.

I've tried to understand what was the problem and figured out that environment.yml file constains - cudatoolkit=11.6 But in README.md you can actually have it written as CUDA SDK 11 for PyTorch extensions, install after Visual Studio (we used 11.8, known issues with 11.6) It feels like it recommends to install 11.8, but actual dependency is different. And you may end up having CUDA 11.8 installed globally but cudatoolkit=11.6 inside conda environment.

Another problem here is that you can't just have cudatoolkit=11.8 installed inside conda environment. Here in the list of packages you just don't have it build next to the right pytorch version.

My solution to get cuda=11.7 which is available for pytorch versions < 2.0 (not like cuda 11.8) have it build was installing all dependencies step by step using this command: conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia After installing cuda 11.7 on the system level. (remember, Visual Studio has to be installed already).

Don't forget to check if you actually have a properly installed pytorch with cuda:
(gs) c:\dev\gaussian-splatting>python
>>> import torch
>>> torch.cuda.is_available()
True

I reinstall the cuda toolkit of 11.7.1 and successed, thanks!

graphdeco-inria / gaussian-splatting

Mismatch between cuda version in documentation and actual cuda in environment file (and possible solution to that) #317