Closed PrinceWang-Cal closed 1 year ago
two days is way too long. maybe at most 30 mins? for me it was like no more than 5-10 minutes max for a K80.
if you have an nvidia GPU for which your CUDA installation is missing kernels, then sometimes the nvidia toolchain will silently try to build kernel images for your GPU and that can take a very long time (and there's basically nothing printed to standard out). for example, if you try to run tensorflow or pytorch for the first time and you're missing CUDA support, this can trigger and make it look like your TF session or pytorch import is hanging indefinitely. I believe this part of the nvidia toolchain leaves a fairly large directory in your homedir (prefixed with a period to make it hidden).
if you have Docker / nvidia-docker available, here is a sketch of a Dockerfile I was able to use to successfully build the CUDA extensions and run all the code:
FROM pytorch/pytorch:1.8.1-cuda11.1-cudnn8-devel
RUN apt-get update && \
DEBIAN_FRONTEND=noninteractive apt-get install -y vim curl wget git less python3-pip libopencv-dev cmake build-essential ninja-build
RUN pip3 install --upgrade pip
RUN pip3 install \
numpy \
imageio \
imageio-ffmpeg \
ipdb \
lpips \
opencv-python==4.4.0.46 \
Pillow==7.2.0 \
pyyaml \
tensorboard==2.7.0 \
pymcubes \
moviepy \
matplotlib \
scipy==1.6.0 \
tqdm
RUN pip3 install nerfvis
RUN echo "now run $ MAX_JOBS=16 pip3 install -vvv . to build svox2 CUDA kernels"
For future reference, a tip is using pip install -e . --verbose
to check if there are any errors. It should only take several minutes, maybe a bit longer if without ninja (no parallel builds)
I am installing the extension on a machine where I don't have access to
sudo apt
. The installation has been running for two days, and it is not done yet. Is this normal? What is the expected amount of time to install it?