RuntimeError with CUDA and cuDNN

cooper1x commented 2 months ago

RuntimeError: D:\a\_work\1\s\onnxruntime\python\onnxruntime_pybind_state.cc:743 onnxruntime::python::CreateExecutionProviderInstance CUDA_PATH is set but CUDA wasn't able to be loaded. Please install the correct version of CUDA and cuDNN as mentioned in the GPU requirements page (https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements), make sure they're in the PATH, and that your GPU is supported.

My GPU is NVIDIA GeForce RTX 4060 Laptop GPU, please tell me how to solve this problem?

NXTler commented 2 months ago

I get the same problem with my RTX 3080 on Linux.

NXTler commented 2 months ago

Was dumb and didn't install CUDA, the problem is that CUDA version 11.8 is not available on the AUR and newer versions don't work. So it's a huge hustle to set up if you don't do it very often. Please consider switching to a newer CUDA version like 12.2

Binozo commented 2 months ago

Having the same issue with CUDA version 12.6 Other applications using CUDA do work

Binozo commented 2 months ago

My bad ✋

The README specifically mentions CUDA Toolkit 11.8

neko commented 2 months ago

even with Cuda compilation tools, release 11.8, V11.8.89 i face the same issue

frostworx commented 2 months ago

@NXTler those steps for installing an old 11.8 cuda work fine: https://superuser.com/questions/1784504/how-do-i-get-cuda-11-7-on-arch-linux

don't forget to set

export CUDA_PATH="/opt/cuda11.8/bin"
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/cuda11.8/targets/x86_64-linux/lib/

before starting.

seems like a compatible CUDNN needs to be installed as well, as the render process crashes with

onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Gemm node. Name:'fullyconnected0' Status Message: /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:121 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudnnStatus_t; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:114 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudnnStatus_t; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDNN failure 1: CUDNN_STATUS_NOT_INITIALIZED ; GPU=0 ; hostname=aaa ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider.cc ; line=172 ; expr=cudnnCreate(&cudnn_handle_);

before creating the final video from the already processed png files.

EDIT:

this project https://github.com/carter4299/cuda_tf_torch suggests

cudnn: https://archive.archlinux.org/packages/c/cudnn/cudnn-8.6.0.163-1-x86_64.pkg.tar.zst

for cuda 11.8 ~(haven't tested yet)~

edit 2:

test with above cudnn-8.6.0.163-1 binary installed failed with:

onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Gemm node. Name:'fullyconnected0' Status Message: /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:121 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cublasStatus_t; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:114 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cublasStatus_t; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUBLAS failure 3: CUBLAS_STATUS_ALLOC_FAILED ; GPU=0 ; hostname=aaa ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider.cc ; line=168 ; expr=cublasCreate(&cublas_handle_);

enough for today :}

mvigerske commented 2 months ago

I have nearly the same error under Windows. Installed everything as recommended, including Cuda 11.8, plus the two lines to install "onnxruntime-gpu==1.16.3". Got a RTX 4060 Ti with 16 GB VRAM. Calculated with an image, I get something good back, including Face Enhancer. As a video, it creates the temp folder including many, but not all, correctly calculated images. Unfortunately, the resolution is quite coarse, so it's useless.

I wonder if it's still working. The GPU usage is zero, but the GPU memory is full. When I end the process, the memory is emptied. Is it still busy or is the memory still full with the last status?

Here is my error message: onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Gemm node. Name:'fc1' Status Message: D:\a\_work\1\s\onnxruntime\core\providers\cuda\cuda_call.cc:121 onnxruntime::CudaCall D:\a\_work\1\s\onnxruntime\core\providers\cuda\cuda_call.cc:114 onnxruntime::CudaCall CUBLAS failure 3: CUBLAS_STATUS_ALLOC_FAILED ; GPU=0 ; hostname=LCARS ; file=D:\a\_work\1\s\onnxruntime\core\providers\cuda\cuda_execution_provider.cc ; line=168 ; expr=cublasCreate(&cublas_handle_);

If I reinstall the requirements and just run python run.py, I get a black rectangle instead of a face when I calculate only one image. I guess the replacement doesn't work. Didn't try video.

EDIT: ok, updated Visual Studio and ffmpeg, then it worked. Now it only complained about too little memory. So just start with "python run.py --execution-provider cuda", without "--execution-threads 60 --max-memory 60", then the face-enhanced video will also work!

frostworx commented 2 months ago

(maybe my weak nvidia gpu is the cuplrit here (single slot GTX 1070 Katana, the other gpu is an amd card))

either way, I'll continue with trying docker, ngaer was so friendly to share their dockerfile here thank you! :)

edit: now, 2 weeks later, I tried again:

deep-live-cam is installed in a venv
the usb camera is connated via usbip (not important, but nice)
regular arch package cudnn-9.2.1.18-1 and custom pkg like described above are installed
I wrote a little helper script which adjusts the required variables and starts deep-live-cam (see below; adjust to your needs)
the program works as expected using the webcam! (both impressive and alarming and creepy what is possible)

#!/bin/bash
GITDIR="/media/nvme/ai/Deep-Live-Cam"
PYVENVDIR="/media/nvme/ai/deep-live-cam"

cd "$GITDIR" || die

source "$PYVENVDIR/bin/activate"
export CUDA_HOME=/opt/cuda11.8/
export CUDA_PATH=/opt/cuda11.8/bin/
export LD_LIBRARY_PATH=:/opt/cuda11.8/targets/x86_64-linux/lib/
python run.py --execution-provider cuda
cd -|| die
deactivate

the host system is headless currently, so I start the gui via vnc (of course this is suboptimal) Would be funny to

run this headless as a server daemon (likely easy to implement, if it does not even work already) (edit: you can provice an imput image with -s bout seems like you can't configure the webcam to be the output target via commandline)
stream the output video stream or export it as linux uvc gadget usb device

hacksider / Deep-Live-Cam

RuntimeError with CUDA and cuDNN #174