nerfstudio-project / nerfstudio

A collaboration friendly studio for NeRFs
https://docs.nerf.studio
Apache License 2.0
9.19k stars 1.23k forks source link

Latest docker images dont run instant-ngp on 3090 series cards. - CUDA error: no kernel image is available for execution on the device #2878

Open kurtjcu opened 7 months ago

kurtjcu commented 7 months ago

Describe the bug Current docker images with tags "main", "1.0.1", and "1.0.0" crash when training.

RuntimeError: CUDA error: no kernel image is available for execution on the device

To Reproduce Steps to reproduce the behavior: Do this on a machine with a 3090 (ubuntu server, nvidia driver support up to cuda 12.3)

wandb docker-run \
            --gpus "device=1" \
            -u $(id -u) \
            -v "datadir:/workspace" \
            -v '/home/user/.cache/:/home/user/.cache/' \
            -p 7017:7007 \
            --rm -it --shm-size=12gb \
            dromni/nerfstudio:1.0.1 ns-train instant-ngp \
            --vis wandb \
            --data "/workspace" 

Expected behavior Using the above command with container tag "0.3.4" functions correctly

yuefengyf commented 6 months ago

+1, similar issue also seen in splatfacto with docker image "1.0.0" and "1.0.1". Here is the error I got:

Screenshot 2024-02-14 at 3 31 41 PM
rowellz commented 6 months ago

I am getting similar issues as well for both 1.0.0 and 1.0.1 images :(

jkulhanek commented 6 months ago

Hi, can you please try the nerfstudio/nerfstudio:latest image?

rowellz commented 6 months ago

That seems to do the trick for me, TYSM! I've been using gaussian-splatting for the dromni/nerfstudio:main image for a while now. Is there any difference or improvement with the splatfacto method in the nerfstudio/nerfstudio:latest image?

Edit: Just wanted to say I am running on an RTX 3060 12GB

jkulhanek commented 6 months ago

Good to hear it works. I am still testing the image, so please let me know if you find any issues. Unfortunately, I don't have access to the Dockerfile used to compile dromni/nerfstudio:main, but the nerfstudio/nerfstudio:latest image just contains the current last commit in the main branch of nerfstudio.

kurtjcu commented 6 months ago

I was having the same issue on all the latest images as well. I used docker to compile one from main branch and then it worked.

eduardohenriquearnold commented 3 months ago

I was having the same issue on an NVIDIA T4, but the nerfstudio/nerfstudio:latest image seems to work like a charm - thanks @jkulhanek .

May I ask what have you changed? Looking online it seems to be something related to the CUDA supported architectures, but it seems like the official nerfstudio docker image contains most common archs, including sm_75 which corresponds to the T4. So I'm just curious what specifically is addressing the issue here?