microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.05k stars 2.83k forks source link

slower after graph optimization! #10538

Open shadowwalker2718 opened 2 years ago

shadowwalker2718 commented 2 years ago

Describe the bug One of my teammates said he followed this page: https://onnxruntime.ai/docs/performance/graph-optimizations.html to generate an offline model. It says:

All optimizations can be performed either online or offline. In online mode, when initializing an inference session, we also apply all enabled graph optimizations before performing model inference. Applying all optimizations each time we initiate a session can add overhead to the model startup time (especially for complex models), which can be critical in production scenarios. This is where the offline mode can bring a lot of benefit. In offline mode, after performing graph optimizations, ONNX Runtime serializes the resulting model to disk. Subsequently, we can reduce startup time by using the already optimized model and disabling all optimizations.

After optimization, I found the model loading time is almost the same, but inference time is even slower than before. I am testing the two models in the same machine, so everything is the same except the model. Anybody knows why?

Urgency need to fix it in 2 days

System information

To Reproduce Just save the optimized model, replace the original model with it, and run the same inference code

Expected behavior The new model should be faster

guoyu-wang commented 2 years ago

Care to share the model(s) such that we can take a look?

shadowwalker2718 commented 2 years ago

i cannot share the model...but let me try to find some other public model example :-(