Trying to increase speed by using CUDAExecutionprovider from onnxruntime-gpu instead of cpu onnxruntime, met with warning about cpu-gpu transfer bottleneck

andrewtvuong commented 1 week ago

I want to use CUDA instead of CPU to increase the speed on tag inference.

My machine Ubuntu 22.04.3 LTS (GNU/Linux 6.5.0-35-generic x86_64), CUDA 12.2

I learned from https://onnxruntime.ai/docs/install/ that if you have cuda 12 must install using pip install onnxruntime-gpu --extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/ as of time of writing, instead of simply pip install onnxruntime-gpu which is for cuda 11. This took me a while to figure out. Kept getting errors that didn't make sense:

[E:onnxruntime
, provider_bridge_ort.cc:1744 TryGetProviderInfo_CUDA] /onnxruntime_src/onnxruntime/core/session/provider_bridge_ort.cc:1426 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcublasLt.so.11: cannot open shared object file: No such file or directory

[W:onnxruntime, onnxruntime_pybind_state.cc:870 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirementsto ensure all dependencies are met.

I had those objects. but after reading carefully and reinstalling based on the above for cuda 12 it worked. Using CUDAExecutionprovider instead of CPUExecutionprovider however did cause a new warning:

[W:onnxruntime:, transformer_memcpy.cc:74 ApplyImpl] 12 Memcpy nodes are added to the graph main_graph for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.

Basically bottlenecked by CPU/GPU data transfer. Trying to figure out but have not been able to successfully.

andrewtvuong commented 1 week ago

Opened this pull to show what I tried https://github.com/pythongosssss/ComfyUI-WD14-Tagger/pull/63

stanleyftf1005 commented 1 week ago

any luck fixing? having the same issue. WD14 tagger been a pain to run

andrewtvuong commented 1 week ago

I just started looking into this yesterday, will try to fix it when I have more time. Just starting discussions here in case I miss something.

pythongosssss / ComfyUI-WD14-Tagger

Trying to increase speed by using CUDAExecutionprovider from onnxruntime-gpu instead of cpu onnxruntime, met with warning about cpu-gpu transfer bottleneck #62