Closed jonastemplestein closed 10 months ago
The NVIDIA Driver was not detected.
This usually pops up if the GPU is not detected. To double check you can open IEx, run Mix.install([:exla]); EXLA.Client.default_name()
and see if it returns :cuda
or :host
.
If it's not detected, you can also try 11.8
just in case the driver does not support the latest CUDA. I'm mostly guessing though, the issue may lay somewhere lower level.
and the image is deprecated but that doesn't matter
That's ok, Nvidia now has a policy of deprecating images when they build a new version and they remove the deprecated images in 6 months. The removal is not an issue, because the Livebook CUDA image is not going to disappear.
Closing this. if it persists, please let us know!
Hello, I'm using Cuda 12.4 and CudNN 8.9 with WSL in windows and the latest-cuda11.8 image.
With the 12.1 image, I don't see cuda in the list of supported platforms when running the command above, but with 11.8, I do get this:
iex(2)> EXLA.Client.get_supported_platforms()
%{host: 12, cuda: -1, interpreter: 1}
iex(3)> EXLA.Client.default_name()
:host
I assume that the Stable Diffusion job won't run on the GPU given that the target is host, instead of cuda? I originally tried the latest version of CudNN (9.0) but didn't even see the cuda:-1 in the list of supported platforms. I've tried running the nvidia benchmark checks in docker (wsl with Ubuntu) which shows my GPU.
Should I be trying another version of Cuda or CudNN?
@PranavRam are there any relevant warnings in the logs?
With the 12.1 image, I don't see cuda in the list of supported platforms
This is weird, are you sure you set XLA_TARGET=cuda120
? You can run Mix.install([:exla], force: true)
and double-check in the logs that it downloads the CUDA xla archive.
Should I be trying another version of Cuda or CudNN?
When running the Docker container, both CUDA and cuDNN is already installed in the container, so it doesn't matter what versions you have installed in the system. The drivers must be compatible with the given CUDA version.
Have you tried iex>
in WSL directly, rather than the Docker image? Did that work differently?
I'm trying to deploy the livebook:latest-cuda12.1 image to a fly.io host with an A100 GPU.
When I boot up the container, I get this big warning banner, saying CUDA support will not be available (and the image is deprecated but that doesn't matter)
Is this expected? If I open a shell in the running container and run the following commands it looks like maybe the driver is working as intended?
Is the message in the banner on startup expected?