Closed KhanhDinhDuy closed 2 months ago
It's likely some operator is not thread safe. To identify which operator having the issue in cuda, we can force the op to fallback to cpu (need comment out some operator in RegisterCudaContribKernels or RegisterCudaKernels and build from source). If you need assistant, please share your onnx model and an example input that can reproduce the issue.
It's likely some operator is not thread safe. To identify which operator having the issue in cuda, we can force the op to fallback to cpu (need comment out some operator in RegisterCudaContribKernels or RegisterCudaKernels and build from source). If you need assistant, please share your onnx model and an example input that can reproduce the issue.
Thanks you for your recommendation. I found a solution here: https://github.com/microsoft/onnxruntime/issues/15154 And the option "options.enable_mem_pattern = False" resolve my problem. Thanks you!
Describe the issue
I have an OCR model with the following architecture ResNet-BiLSTM-CTC OS environment:
cuda_provider_options = {'gpu_mem_limit': 2 1024 1024 * 1024} providers = [("CUDAExecutionProvider", cuda_provider_options), "CPUExecutionProvider"]
When I test the model normal with only 1 main process and dynamic batch sizes during inference, the model runs normally.
But, when I serve it with Flask and multi threading (2 threads), I got unexpected outputs. Most of the time, the outputs are as my expectations. But, sometime, I got something strange like "", "c Dc D A ct D c t I m I N i o cI n c", ...
Note: I use the same samples during the test.
To reproduce
Note:
Step 1: Normal inference after training When I test the model normal with only 1 main process and dynamic batch sizes during inference, the model runs normally.
Step 2: Serve the service with Flask and multi threading But, when I serve it with Flask and multi threading (2 threads), I got unexpected outputs. Most of the time, the outputs are as my expectations. But, sometime, I got something strange like "", "c Dc D A ct D c t I m I N i o cI n c", ...
When error occurs:
Urgency
No response
Platform
Linux
OS Version
ubuntu20.04
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
onnxruntime-gpu==1.14.1
ONNX Runtime API
Python
Architecture
X64
Execution Provider
CUDA
Execution Provider Library Version
cuda:11.6.2
Model File
No response
Is this a quantized model?
No