microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.79k stars 2.94k forks source link

Missing dll cudnn_ops_infer64_8.dll does not generate a python error #20605

Open martinResearch opened 6 months ago

martinResearch commented 6 months ago

Describe the issue

When trying to create a session with onnx_sess = InferenceSession(model, providers=["CUDAExecutionProvider"]) with the dll cudnn_ops_infer64_8.dll missing from the path (one can simply rename this file to reproduce), we get an error message printed in the log "Could not locate cudnn_ops_infer64_8.dll. Please make sure it is in your library path!" and the code stops its execution, but we do not get a python error.

Why is it a problem? Because this error message in not a actual python error it not displayed in the log when using pytest for example, which make investigating the cause of the failed test harder when this dll is missing. Digging in the python code the code stops it execution on line https://github.com/microsoft/onnxruntime/blob/737eb48f5c26ed2ac97e6fce0faf0831207d6f59/onnxruntime/python/onnxruntime_inference_collection.py#L483 The pybind binding should throw a python error instead of just stopping its execution.

To reproduce

Urgency

No response

Platform

Windows

OS Version

11

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

onnxruntime-gpu 1.16.3

ONNX Runtime API

Python

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

CUDA 11.8

yuslepukhin commented 6 months ago

DLL loading especially indirect dependencies are handled by the OS. The message you are seeing is from a system loader. Neither ORT nor Python have any control over that.

martinResearch commented 6 months ago

I understand from your response that neither ORT or python can change the error message that the OS generates when trying to load the dll. But I am not sure to understand why that would imply that ORT has no way to detect that the OS did not manage to load the library and then throw an error if that is the case. It seems to me that if the dll loading fail then we would get out_module == nullptr on this line https://github.com/microsoft/onnxruntime/blob/58d7b1220550f87ad58a195dc5605fa8c23fe98f/winml/lib/Api.Ort/OnnxruntimeEnvironment.cpp#L43C1-L45C4. and we should then be able to throw an error that gets propagated to python. I am missing something?

yuslepukhin commented 6 months ago

https://learn.microsoft.com/en-us/windows/win32/dlls/load-time-dynamic-linking