Closed cooper1x closed 1 week ago
I get the same problem with my RTX 3080 on Linux.
Was dumb and didn't install CUDA, the problem is that CUDA version 11.8 is not available on the AUR and newer versions don't work. So it's a huge hustle to set up if you don't do it very often. Please consider switching to a newer CUDA version like 12.2
Having the same issue with CUDA version 12.6
Other applications using CUDA do work
even with Cuda compilation tools, release 11.8, V11.8.89
i face the same issue
@NXTler those steps for installing an old 11.8 cuda work fine: https://superuser.com/questions/1784504/how-do-i-get-cuda-11-7-on-arch-linux
don't forget to set
export CUDA_PATH="/opt/cuda11.8/bin"
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/cuda11.8/targets/x86_64-linux/lib/
before starting.
seems like a compatible CUDNN needs to be installed as well, as the render process crashes with
onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Gemm node. Name:'fullyconnected0' Status Message: /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:121 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudnnStatus_t; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:114 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudnnStatus_t; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDNN failure 1: CUDNN_STATUS_NOT_INITIALIZED ; GPU=0 ; hostname=aaa ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider.cc ; line=172 ; expr=cudnnCreate(&cudnn_handle_);
before creating the final video from the already processed png files.
EDIT:
this project https://github.com/carter4299/cuda_tf_torch suggests
cudnn: https://archive.archlinux.org/packages/c/cudnn/cudnn-8.6.0.163-1-x86_64.pkg.tar.zst
for cuda 11.8 ~(haven't tested yet)~
edit 2:
test with above cudnn-8.6.0.163-1 binary installed failed with:
onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Gemm node. Name:'fullyconnected0' Status Message: /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:121 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cublasStatus_t; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:114 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cublasStatus_t; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUBLAS failure 3: CUBLAS_STATUS_ALLOC_FAILED ; GPU=0 ; hostname=aaa ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider.cc ; line=168 ; expr=cublasCreate(&cublas_handle_);
enough for today :}
I have nearly the same error under Windows. Installed everything as recommended, including Cuda 11.8, plus the two lines to install "onnxruntime-gpu==1.16.3". Got a RTX 4060 Ti with 16 GB VRAM. Calculated with an image, I get something good back, including Face Enhancer. As a video, it creates the temp folder including many, but not all, correctly calculated images. Unfortunately, the resolution is quite coarse, so it's useless.
I wonder if it's still working. The GPU usage is zero, but the GPU memory is full. When I end the process, the memory is emptied. Is it still busy or is the memory still full with the last status?
Here is my error message:
onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Gemm node. Name:'fc1' Status Message: D:\a\_work\1\s\onnxruntime\core\providers\cuda\cuda_call.cc:121 onnxruntime::CudaCall D:\a\_work\1\s\onnxruntime\core\providers\cuda\cuda_call.cc:114 onnxruntime::CudaCall CUBLAS failure 3: CUBLAS_STATUS_ALLOC_FAILED ; GPU=0 ; hostname=LCARS ; file=D:\a\_work\1\s\onnxruntime\core\providers\cuda\cuda_execution_provider.cc ; line=168 ; expr=cublasCreate(&cublas_handle_);
If I reinstall the requirements and just run python run.py, I get a black rectangle instead of a face when I calculate only one image. I guess the replacement doesn't work. Didn't try video.
EDIT: ok, updated Visual Studio and ffmpeg, then it worked. Now it only complained about too little memory. So just start with "python run.py --execution-provider cuda", without "--execution-threads 60 --max-memory 60", then the face-enhanced video will also work!
(maybe my weak nvidia gpu is the cuplrit here (single slot GTX 1070 Katana, the other gpu is an amd card))
either way, I'll continue with trying docker, ngaer was so friendly to share their dockerfile here thank you! :)
edit: now, 2 weeks later, I tried again:
#!/bin/bash
GITDIR="/media/nvme/ai/Deep-Live-Cam"
PYVENVDIR="/media/nvme/ai/deep-live-cam"
cd "$GITDIR" || die
source "$PYVENVDIR/bin/activate"
export CUDA_HOME=/opt/cuda11.8/
export CUDA_PATH=/opt/cuda11.8/bin/
export LD_LIBRARY_PATH=:/opt/cuda11.8/targets/x86_64-linux/lib/
python run.py --execution-provider cuda
cd -|| die
deactivate
the host system is headless currently, so I start the gui via vnc (of course this is suboptimal) Would be funny to
My GPU is NVIDIA GeForce RTX 4060 Laptop GPU, please tell me how to solve this problem?