microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.26k stars 2.87k forks source link

Offline optimization mode with CUDA EP #9325

Open mitkir opened 2 years ago

mitkir commented 2 years ago

According to documentation https://toscode.gitee.com/zonghaofan/onnxruntime/blob/master/docs/ONNX_Runtime_Graph_Optimizations.md, in offline mode, after performing graph optimizations, ONNX Runtime serializes the resulting model to disk. I tried to create session from optimized .ort model with CUDA Execution Provider, but it provides optimization again after model loading and takes the same time. Could you provide a snippet how to use optimized model in offline mode?

To Reproduce Ort::SessionOptions session_options; // Set graph optimization level session_options.SetGraphOptimizationLevel(GraphOptimizationLevel::ORT_ENABLE_EXTENDED); // To enable model serialization after graph optimization set this session_options.SetOptimizedModelFilePath("optimized_filepath"); auto session = Ort::Session(env, "model_file_path", session_options); // takes about 10 seconds

Ort::SessionOptions sessionoptions; auto session = Ort::Session(env, "optimized_file_path", session_options); //takes about 10 seconds

Expected behavior Should be much faster to create session from optimized model in offline mode.

skottmckay commented 2 years ago

If you're loading a previously optimized model you can turn optimization off.

session_options.SetGraphOptimizationLevel(GraphOptimizationLevel::ORT_DISABLE_ALL);

Side note: you're not enabling the CUDA EP in your example code. Not sure if that was intentional.

mitkir commented 2 years ago

If you're loading a previously optimized model you can turn optimization off.

session_options.SetGraphOptimizationLevel(GraphOptimizationLevel::ORT_DISABLE_ALL);

Side note: you're not enabling the CUDA EP in your example code. Not sure if that was intentional.

@skottmckay, Thank you for the response! I tried this way to disable optimization during loading the optimised model, but it failed. I use CUDA EP and include following header the beginning (before sample code above).

Also maybe should be mentioned, that after initial optimization, one part of nodes executes on CPU and second part on GPU. Can it be a source of the problem?

stale[bot] commented 2 years ago

This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details.