microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.58k stars 2.92k forks source link

Not able to save model when using execution providers #12804

Closed prikmm closed 2 years ago

prikmm commented 2 years ago

Describe the bug A clear and concise description of what the bug is. To avoid repetition please make sure this is not one of the known issues mentioned on the respective release page.

I am trying to save the optimized model using optimized_model_filepath in SessionOptions after the model has been loaded and optimized using the execution providers in the inference session but I keep getting the below displayed error:

---------------------------------------------------------------------------
Fail                                      Traceback (most recent call last)
/home/priyammehta/geant4_par04/notebooks/rough.ipynb Cell 58 in <cell line: 5>()
      [2](vscode-notebook-cell:/home/priyammehta/geant4_par04/notebooks/rough.ipynb#Y111sZmlsZQ%3D%3D?line=1) sess_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL
      [3](vscode-notebook-cell:/home/priyammehta/geant4_par04/notebooks/rough.ipynb#Y111sZmlsZQ%3D%3D?line=2) sess_options.optimized_model_filepath = "/home/priyammehta/geant4_par04/build/MLModels/opt.onnx"
----> [5](vscode-notebook-cell:/home/priyammehta/geant4_par04/notebooks/rough.ipynb#Y111sZmlsZQ%3D%3D?line=4) session = ort.InferenceSession("/home/priyammehta/geant4_par04/build/MLModels/Generator.onnx",
      [6](vscode-notebook-cell:/home/priyammehta/geant4_par04/notebooks/rough.ipynb#Y111sZmlsZQ%3D%3D?line=5)                                providers=["OpenVINOExecutionProvider", "CPUExecutionProvider"],
      [7](vscode-notebook-cell:/home/priyammehta/geant4_par04/notebooks/rough.ipynb#Y111sZmlsZQ%3D%3D?line=6)                                sess_options=sess_options)

File ~/miniconda3/envs/GSOCenv/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py:335, in InferenceSession.__init__(self, path_or_bytes, sess_options, providers, provider_options, **kwargs)
    332 disabled_optimizers = kwargs['disabled_optimizers'] if 'disabled_optimizers' in kwargs else None
    334 try:
--> 335     self._create_inference_session(providers, provider_options, disabled_optimizers)
    336 except ValueError:
    337     if self._enable_fallback:

File ~/miniconda3/envs/GSOCenv/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py:381, in InferenceSession._create_inference_session(self, providers, provider_options, disabled_optimizers)
    378     disabled_optimizers = set(disabled_optimizers)
    380 # initialize the C++ InferenceSession
--> 381 sess.initialize_session(providers, provider_options, disabled_optimizers)
    383 self._sess = sess
    384 self._sess_options = self._sess.session_options

Fail: [ONNXRuntimeError] : 1 : FAIL : Unable to serialize model as it contains compiled nodes. Please disable any execution providers which generate compiled nodes.

I don't get this error when I use CUDA. But, when I use oneDNN, OpenVINO and TensorRT Execution Providers this happens.

Urgency If there are particular important use cases blocked by this or strict project-related timelines, please share more information and dates. If there are no hard deadlines, please specify none.

System information

To Reproduce

sess_options = ort.SessionOptions() sess_options.optimized_model_filepath =

session = ort.InferenceSession(, providers=[], sess_options=sess_options)


- Attach the ONNX model to the issue (where applicable) to expedite investigation.
https://cern.ch/geant4-data/datasets/examples/extended/parameterisations/Par04/Generator.onnx

**Expected behavior**
Seamless saving of the optimized model. The optimized model can then be loaded again in the inference session and work as expected.
jywu-msft commented 2 years ago

This is currently a limitation of the implementation. see https://github.com/microsoft/onnxruntime/pull/5840 I'll provide some background context. OnnxRuntime Execution Providers can be implemented using one of two approaches. The classic approach is to implement kernels which follow the corresponding ONNX operator specification. The implementation signatures are registered in an EP’s kernel registry. These signatures are static and known at build time. At runtime, OnnxRuntime assigns nodes in the graph to an EP when it can find a matching signature in that EP’s kernel registry. The CUDA Execution Provider follows this pattern of implementation. The other approach is to implement the Function Kernel (aka Compiled Kernel) interface, where the EP produces a list of subgraphs it supports, and the framework rewrites the main graph, fusing each subgraph into a single node. A matching implementation signature is generated by the framework at runtime and used for assigning the fused nodes to the associated EP. TensorRT, OpenVINO, and oneDNN Execution Providers implement the Function/Compiled Kernel approach. We don't support serializing graphs which contain fused Execution Provider nodes. The signatures won't exist in the static registry and the model will fail to load (model is considered invalid).

hariharans29 commented 2 years ago

Closing as this is by design