Closed skinnynpale closed 2 months ago
For CUDA failure 804: forward compatibility was attempted on non supported HW
, the recommendation is to update your cuda driver. See https://forums.developer.nvidia.com/t/forward-compatibility-was-attempted-on-non-supported-hw/204254/6
Try install driver 555 from https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=22.04&target_type=deb_local
Describe the issue
EP Error EP Error /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:123 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char, const char, ERRTYPE, const char, const char, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:116 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char, const char, ERRTYPE, const char, const char, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDA failure 804: forward compatibility was attempted on non supported HW ; GPU=22179 ; hostname=9407c20fa6b6 ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_executionprovider.cc ; line=280 ; expr=cudaSetDevice(info.device_id);
when using ['CUDAExecutionProvider'] Falling back to ['CUDAExecutionProvider', 'CPUExecutionProvider'] and retrying.
2024-07-21 23:37:56.704 Uncaught app exception Traceback (most recent call last): File "/root/sasha-ai-workflow/src/chat-service/venv/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 419, in init self._create_inference_session(providers, provider_options, disabled_optimizers) File "/root/sasha-ai-workflow/src/chat-service/venv/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 483, in _create_inference_session sess.initialize_session(providers, provider_options, disabled_optimizers) RuntimeError: /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:123 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char, const char, ERRTYPE, const char, const char, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:116 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char, const char, ERRTYPE, const char, const char, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDA failure 804: forward compatibility was attempted on non supported HW ; GPU=22179 ; hostname=9407c20fa6b6 ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_executionprovider.cc ; line=280 ; expr=cudaSetDevice(info.device_id);
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "/root/sasha-ai-workflow/src/chat-service/venv/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 589, in _run_script exec(code, module.dict) File "/root/sasha-ai-workflow/src/chat-service/src/v1-streamlit.py", line 6, in
from EmbeddingCache import EmbeddingCache
File "/root/sasha-ai-workflow/src/chat-service/src/EmbeddingCache.py", line 40, in
embedding_model = TextEmbedding(
File "/root/sasha-ai-workflow/src/chat-service/venv/lib/python3.10/site-packages/fastembed/text/text_embedding.py", line 61, in init
self.model = EMBEDDING_MODEL_TYPE(
File "/root/sasha-ai-workflow/src/chat-service/venv/lib/python3.10/site-packages/fastembed/text/onnx_embedding.py", line 237, in init
self.load_onnx_model(
File "/root/sasha-ai-workflow/src/chat-service/venv/lib/python3.10/site-packages/fastembed/common/onnx_model.py", line 80, in load_onnx_model
self.model = ort.InferenceSession(
File "/root/sasha-ai-workflow/src/chat-service/venv/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 432, in init
raise fallback_error from e
File "/root/sasha-ai-workflow/src/chat-service/venv/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 427, in init
self._create_inference_session(self._fallback_providers, None)
File "/root/sasha-ai-workflow/src/chat-service/venv/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 483, in _create_inference_session
sess.initialize_session(providers, provider_options, disabled_optimizers)
RuntimeError: /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:123 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char, const char, ERRTYPE, const char, const char, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:116 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char, const char, ERRTYPE, const char, const char, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDA failure 804: forward compatibility was attempted on non supported HW ; GPU=22179 ; hostname=9407c20fa6b6 ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_executionprovider.cc ; line=280 ; expr=cudaSetDevice(info.device_id);
To reproduce
!pip install onnxruntime-gpu -i https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/ -qq !pip install fastembed-gpu -qqq
from typing import List
import numpy as np
from fastembed import TextEmbedding
embedding_model_gpu = TextEmbedding( model_name="BAAI/bge-small-en-v1.5", providers=["CUDAExecutionProvider"] ) embedding_model_gpu.model.model.get_providers()
Urgency
No response
Platform
Linux
OS Version
Ubuntu 22.04
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
1.18.1
ONNX Runtime API
Python
Architecture
X64
Execution Provider
CUDA
Execution Provider Library Version
CUDA 12.4