Closed ThreeDeeJay closed 1 year ago
Yes, GPU acceleration would boost the performance. I get 13+ fps for dpt_hybrid_384
.
Prepare the CUDA/cuDNN and on the program find Options -> Model Settings
. Select dpt_hybrid_384
or any models and toggle Use GPU for OnnxRuntime
. The CUDA
dropdown shouldn't be changed (others are not implemented). Click the Load
button and it'll be set if no problem arose.
Prepare the CUDA/cuDNN
I think this is what I think I'm missing. I installed the official CUDA installer and ran all the pip install scripts here
But whenever I check Use GPU for OnnxRuntime
and click Load
, it says Model is not set!
You don't have to install OnnxRuntime via pip since the dll files are included in the build. I think the issue here is that the cuDNN has not been installed. Are there any cudnn*.dll
-like files under the CUDA bin folder or any directory reachable from PATH
? Also, the console output (activated by the backtick[`] key) would be helpful.
Ohh now we're getting somewhere but not quite there yet.
I installed CUDA 11.7 then extracted cuDNN bin DLL files into C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\bin
And now I'm able to load dpt_hybrid_384
with Use GPU for OnnxRuntime
enabled and set to CUDA
However, as soon as I load a file, the whole program crashes. Is there a way to enable logging to check what went wrong?
The log file is under C:\Users\<USERNAME>\AppData\LocalLow\parkchamchi\DepthViewer
, as Player.log
. Please check if any significant output exists under the line Loading model:
It appears that a fatal error occurred loading onnxruntime_providers_cuda.dll
or onnxruntime_providers_shared.dll
. My guess is that the cuDNN version does not match the version of CUDA. My setup is CUDA v11.7
and cuDNN v8.2.4
. If the problem persists, try walking the dependencies for the both DepthViewer.exe
and the dll files in ./DepthViewer_Data/Plugins/x86_64
.
That was it! I tried (cuDNN 8.4.2) and now I'm getting around 10FPS on the first run and higher on subsequent plays! By the way, this program also works with NVIDIA 3D Vision, which is what I used to capture full res 1080p cross-eyed 3D screenshots 👌
Anyhow, I guess the only thing to sort out is a possible bottleneck. Neither the CPU nor the GPU have over 1/3 usage but it's still dropping a significant amount of frames. Any idea why this is happening? 🤔
Running the same model using Python OnnxRuntime script (with no 3D visualization or whatsoever) gives me ~25fps so I guess there is a significant bottleneck here. My guess is that it's due to the GPU-CPU bottleneck, since the code using ORT fetches the RenderTexture
(GPU) to Texture2D
(CPU), converts it to the float array, then converts it to a tensor (GPU). Testing the built-in model (MiDaS v2.1 small 256), the default Barracuda one, which does not have such CPU-GPU overhead, has ~500fps; while the OnnxRuntime one has ~150fps, with significant oscillation. But what is weird is that the small model has somewhat insignificant overhead unlike the large one.
The only reason OnnxRuntime is used is that Unity's ML framework Barracuda 3.0.0 wouldn't accept MiDaS v3+ models. I'd update it when Barracuda 4.0 comes out and (hopefully) supports the newer models.
Hi, I have the same Model is not set!
message when loading another model. I installed Cuda 12.2 and copied the Cudnn 8.9 DLLs in Cuda’s bin directory.
The error message I get is
Loading model: ./onnx\dpt_beit_large_512.onnx
OnnxRuntimeDepthModel(): using the provider CUDA
Using gpuid=0
LoadModel(): Got exception: Microsoft.ML.OnnxRuntime.OnnxRuntimeException: [ErrorCode:RuntimeException] D:\a\_work\1\s\onnxruntime\core\session\provider_bridge_ort.cc:1069 onnxruntime::ProviderLibrary::Get [ONNXRuntimeError] : 1 : FAIL : LoadLibrary failed with error 126 "" when trying to load "V:\Games_VR\DepthViewer\Build\DepthViewer_Data\Plugins\x86_64\onnxruntime_providers_cuda.dll"
@CubicReg Guess you'll also wanna stick to CUDA v11.7 and cuDNN v8.2.4 lol @parkchamchi So any idea which is the model with most accurate results that runs in realtime? and which one do you personally use? The default one has good performance but wiggles a lot and I think it's also a lot jaggier.
On a side note, maybe it'd be nice to save the last used model, since I have to switch from the built-in one every time I run the app 😅
@CubicReg The current ORT version does not support CUDA 12.x. #
@ThreeDeeJay I use dpt_hybrid_384
since it's accurate and robust.
I agree that model preloading option would be convenient, I'd add it later.
Alright, guess I'll close this issue since I technically did get GPU acceleration working and better performance, though I'd love to stay updated on a possible bottleneck fix/workaround because I'd love to have dpt_beit_large_512
doing its magic at full speed.
In the meantime, I made a guide for people new to this to set up DepthViewer with GPU acceleration and better models, so feel free to adapt it into the readme because I have a feeling a lot of people try this app, see bad accuracy/performance then just quit and miss out on its full potential:
:fast_forward: = Skip unless you want more accurate results, though possibly worse performance
cudnn-11.4-windows-x64-v8.2.4.15.zip\cuda\bin\
DLL files into C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\bin
folderDepthViewer\onnx
folder:
https://github.com/parkchamchi/MiDaS/releases/download/22.12.07/dpt_hybrid_384.onnx (worst depth detection but faster)
https://github.com/parkchamchi/MiDaS/releases/download/23.02.18/dpt_swin2_large_384.onnx (somewhere in between)
https://github.com/parkchamchi/MiDaS/releases/download/23.02.18/dpt_beit_large_512.onnx (good detection but slower) dpt_hybrid_384
or dpt_beit_large_512
null
like this)Some music videos with dpt_beit_large_512
:
I still have the same error message Model is not set!
when loading another model after installing the files from your links (https://developer.download.nvidia.com/compute/cuda/11.7.0/local_installers/cuda_11.7.0_516.01_windows.exe and https://developer.nvidia.com/compute/machine-learning/cudnn/secure/8.2.4/11.4_20210831/cudnn-11.4-windows-x64-v8.2.4.15.zip)
The models load fine when I don’t activate the Use GPU for OnnxRuntime
option.
The full log when loading a model:
Loading model: ./onnx\dpt_swin2_large_384.onnx
OnnxRuntimeDepthModel(): using the provider CUDA
Using gpuid=0
LoadModel(): Got exception: Microsoft.ML.OnnxRuntime.OnnxRuntimeException: [ErrorCode:RuntimeException] D:\a\_work\1\s\onnxruntime\core\session\provider_bridge_ort.cc:1069 onnxruntime::ProviderLibrary::Get [ONNXRuntimeError] : 1 : FAIL : LoadLibrary failed with error 126 "" when trying to load "V:\Games_VR\DepthViewer\Build\DepthViewer_Data\Plugins\x86_64\onnxruntime_providers_cuda.dll"
at Microsoft.ML.OnnxRuntime.NativeApiStatus.VerifySuccess (System.IntPtr nativeStatus) [0x0002c] in <d7591e6396b14c7b9ff6a962184da8b3>:0
at Microsoft.ML.OnnxRuntime.SessionOptions.AppendExecutionProvider_CUDA (System.Int32 deviceId) [0x0000d] in <d7591e6396b14c7b9ff6a962184da8b3>:0
at Microsoft.ML.OnnxRuntime.SessionOptions.MakeSessionOptionWithCudaProvider (System.Int32 deviceId) [0x0000d] in <d7591e6396b14c7b9ff6a962184da8b3>:0
at OnnxRuntimeDepthModel..ctor (System.String onnxpath, System.String modelType, System.String provider, System.Int32 gpuid, System.String settings) [0x00150] in <638d9a2582cc46beaff79da68ac7e852>:0
at DepthModelBehavior.GetDepthModel (System.String onnxpath, System.String modelType, System.Boolean useOnnxRuntime) [0x0003b] in <638d9a2582cc46beaff79da68ac7e852>:0
at MainBehavior.LoadModel (System.String onnxpath, System.Boolean useOnnxRuntime) [0x0003d] in <638d9a2582cc46beaff79da68ac7e852>:0
Failed to load: dpt_swin2_large_384
I have the latest NVidia drivers on a 4090.
That's weird, can you type nvcc --version
on cmd to check if it is on the PATH?
That's weird, can you type
nvcc --version
on cmd to check if it is on the PATH?
With nvcc --version
on cmd I first got 'nvcc' is not recognized as an internal or external command, operable program or batch file
.
I checked the PATH, in System Variables I had both CUDA_PATH and CUDAÂ PATH_V117 set to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7
I had installed the setup with only the CUDA runtimes.
So I re-installed the CUDA setup with the runtimes and the compiler options below the development section. Still the same error.
Then re-installed the CUDA setup with the runtimes, the compiler and the tools under the development section.
Now the command nvcc --version
works:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_May__3_19:00:59_Pacific_Daylight_Time_2022
Cuda compilation tools, release 11.7, V11.7.64
Build cuda_11.7.r11.7/compiler.31294372_0
But I still get Model is not set!
when loading a model with Use GPU for OnxxRuntime
. Log from LocalLow\parkchamchi\DepthViewer\Player.log is the same error message as before too
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\bin
should be under PATH
, since that directory is where the cuda/cudnn dll files (and nvcc.exe
) are located. I don't know what you mean by "development section", but since nvcc --version
works I assume that it is under PATH
. If the cuDNN dll files are under the directory it should work. But as it doesn't...
DepthViewer_Data/Plugins/x86_64/onnxruntime_providers_cuda.dll
and see if there is any missing dependency.If none of these work I don't know what the problem is, and in that case I'd recommend you the python-zeromq method, which infers the ML model on the python side not C#/Unity side.
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\bin
should be underPATH
Right I do have C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\bin
in the PATH
environment variable.
I don't know what you mean by "development section"
I was talking about the options in the setup
I checked every dependancy DLL from DepthViewer_Data\Plugins\x86_64\nnxruntime_providers_cuda.dll
, each one is present and has the status "Loading PE file xxxx.dll successfull" in Dependancy Walker.
I will think about the Python method.
Edit: actually after unfolding several levels of dependencies in Dependancy Walker, I see ext-ms-win-oobe-query-l1-1-0.dll
missing but from what I read on stackoverflow it can be ignored.
I got the models loading via GPU CUDA now, I can’t explain what I did other than restarting the computer. As far as I’m concerned this issue can be closed.
I got the models loading via GPU CUDA now, I can’t explain what I did other than restarting the computer. As far as I’m concerned this issue can be closed.
Glad to hear that, I think restarting the computer may have affected the CUDA setup.
Testing the built-in model (MiDaS v2.1 small 256), the default Barracuda one, which does not have such CPU-GPU overhead, has ~500fps; while the OnnxRuntime one has ~150fps, with significant oscillation. But what is weird is that the small model has somewhat insignificant overhead unlike the large one. The only reason OnnxRuntime is used is that Unity's ML framework Barracuda 3.0.0 wouldn't accept MiDaS v3+ models. I'd update it when Barracuda 4.0 comes out and (hopefully) supports the newer models.
@parkchamchi By the way, did you check out performance with Sentis? Apparently it's the successor of Barracuda 3.0: https://forum.unity.com/threads/unity-sentis.1454530/ https://blog.unity.com/engine-platform/introducing-unity-muse-and-unity-sentis-ai https://docs.unity3d.com/Packages/com.unity.sentis@1.3/manual/index.html https://docs.unity3d.com/Packages/com.unity.sentis@1.1/manual/upgrade-guide.html
I still haven't figured out how to get at least 24FPS video on the high quality models, not even with CUDA 😔 I know there's the option to pre-generate the whole .depthviewer file, but that's also affected by the performance bottleneck, and I'm not sure if there's a way to force ffpymq to load them 🤔
On a side note, I ran some benchmarks here to compare the output of multiple models. There are more, but I got errors since they're not implemented.
Sentis
Thanks for letting me know, I'll try it later. Maybe it can relieve the bottleneck.
P.S. The table is great, thank you.
@parkchamchi Thanks, I can confirm Depth Anything in ONNX format now works directly via the Unity app. 👌
However, performance-wise, there doesn't seem to be a noticeable change (still low FPS with massive drops on heavy models and low GPU/CPU usage)
Here's some tests I ran. Mostly Sentis, except _ONNX
are for onnxruntime checked in the options.
FPS CPU GPU RAM Build
Built-in MiDAS 100% 17% 31% 3900MB v0.10.0-beta 1
dpt_hybrid_384+CUDA 10 13% 54% 5000MB v0.10.0-beta 1
depth_anything_vits14+CUDA 15 14% 47% 4800MB v0.10.0-beta 1
depth_anything_vitb14+CUDA 8 9% 70% 5000MB v0.10.0-beta 1
depth_anything_vitl14+CUDA 5* 10% 85% 8300MB v0.10.0-beta 1
depth_anything_vitl14_ONNX 0.2 32% 0% 7000MB v0.10.0-beta 1
depth_anything_vitl14_ONNX+CUDA 8* 15% 50% 7000MB v0.10.0-beta 1
depth_anything_vitl14_ffpymq+CUDA 2 26% 70% 5300MB v0.10.0-beta 1
dpt_beit_large_512_ONNX+CUDA 8 15% 50% 7000MB v0.9.1
dpt_beit_large_512_ONNX+CUDA 8 15% 50% 7000MB v0.10.0-beta 1
*Constant spikes
Did you notice similar performance on BEiT/Depth Anything? I wonder if using run_video.py implemented optimizations for video performance. and what version of Python, CUDA, CUDNN and Torch do you use? someone brought up xformers which apparently isn't available on CUDA 11.7 so I wonder if implementing that and allowing CUDA upgrade would help squeeze a few extra frames 🤔
That run_video.py
file seems to behave identically to our script, inferring per frame. (See under while raw_video.isOpened():
.)
and what version of Python, CUDA, CUDNN and Torch do you use?
python 3.9.6, cuda v11.7, cudnn v8.3.1, torch 2.0.1+cu117
Higher version of python/torch would work with the existing scripts.
Hi, I'm getting really nice results with the large models, but performance is terrible (<1FPS) I'm currently running a RTX 2080 Ti that's barely getting any use. would GPU acceleration get significant performance boost? If so, could you please be more specific as to how to set up DepthViewer with CUDA/cuDNN? There's so many options I'm not sure what/how to install them exactly, or where to get it from.