EP Error /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:123

Describe the issue

EP Error EP Error /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:123 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char, const char, ERRTYPE, const char, const char, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:116 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char, const char, ERRTYPE, const char, const char, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDA failure 804: forward compatibility was attempted on non supported HW ; GPU=22179 ; hostname=9407c20fa6b6 ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_executionprovider.cc ; line=280 ; expr=cudaSetDevice(info.device_id);

when using ['CUDAExecutionProvider'] Falling back to ['CUDAExecutionProvider', 'CPUExecutionProvider'] and retrying.

2024-07-21 23:37:56.704 Uncaught app exception Traceback (most recent call last): File "/root/sasha-ai-workflow/src/chat-service/venv/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 419, in init self._create_inference_session(providers, provider_options, disabled_optimizers) File "/root/sasha-ai-workflow/src/chat-service/venv/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 483, in _create_inference_session sess.initialize_session(providers, provider_options, disabled_optimizers) RuntimeError: /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:123 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char, const char, ERRTYPE, const char, const char, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:116 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char, const char, ERRTYPE, const char, const char, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDA failure 804: forward compatibility was attempted on non supported HW ; GPU=22179 ; hostname=9407c20fa6b6 ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_executionprovider.cc ; line=280 ; expr=cudaSetDevice(info.device_id);

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/root/sasha-ai-workflow/src/chat-service/venv/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 589, in _run_script exec(code, module.dict) File "/root/sasha-ai-workflow/src/chat-service/src/v1-streamlit.py", line 6, in from EmbeddingCache import EmbeddingCache File "/root/sasha-ai-workflow/src/chat-service/src/EmbeddingCache.py", line 40, in embedding_model = TextEmbedding( File "/root/sasha-ai-workflow/src/chat-service/venv/lib/python3.10/site-packages/fastembed/text/text_embedding.py", line 61, in init self.model = EMBEDDING_MODEL_TYPE( File "/root/sasha-ai-workflow/src/chat-service/venv/lib/python3.10/site-packages/fastembed/text/onnx_embedding.py", line 237, in init self.load_onnx_model( File "/root/sasha-ai-workflow/src/chat-service/venv/lib/python3.10/site-packages/fastembed/common/onnx_model.py", line 80, in load_onnx_model self.model = ort.InferenceSession( File "/root/sasha-ai-workflow/src/chat-service/venv/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 432, in init raise fallback_error from e File "/root/sasha-ai-workflow/src/chat-service/venv/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 427, in init self._create_inference_session(self._fallback_providers, None) File "/root/sasha-ai-workflow/src/chat-service/venv/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 483, in _create_inference_session sess.initialize_session(providers, provider_options, disabled_optimizers) RuntimeError: /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:123 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char, const char, ERRTYPE, const char, const char, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:116 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char, const char, ERRTYPE, const char, const char, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDA failure 804: forward compatibility was attempted on non supported HW ; GPU=22179 ; hostname=9407c20fa6b6 ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_executionprovider.cc ; line=280 ; expr=cudaSetDevice(info.device_id);

To reproduce

!pip install onnxruntime-gpu -i https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/ -qq !pip install fastembed-gpu -qqq

from typing import List

import numpy as np

from fastembed import TextEmbedding

embedding_model_gpu = TextEmbedding( model_name="BAAI/bge-small-en-v1.5", providers=["CUDAExecutionProvider"] ) embedding_model_gpu.model.model.get_providers()

Urgency

No response

Platform

Linux

OS Version

Ubuntu 22.04

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.18.1

ONNX Runtime API

Python

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

CUDA 12.4

microsoft / onnxruntime