slower after graph optimization!

shadowwalker2718 commented 2 years ago

Describe the bug One of my teammates said he followed this page: https://onnxruntime.ai/docs/performance/graph-optimizations.html to generate an offline model. It says:

All optimizations can be performed either online or offline. In online mode, when initializing an inference session, we also apply all enabled graph optimizations before performing model inference. Applying all optimizations each time we initiate a session can add overhead to the model startup time (especially for complex models), which can be critical in production scenarios. This is where the offline mode can bring a lot of benefit. In offline mode, after performing graph optimizations, ONNX Runtime serializes the resulting model to disk. Subsequently, we can reduce startup time by using the already optimized model and disabling all optimizations.

After optimization, I found the model loading time is almost the same, but inference time is even slower than before. I am testing the two models in the same machine, so everything is the same except the model. Anybody knows why?

Urgency need to fix it in 2 days

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04
ONNX Runtime installed from (source or binary): pypi
ONNX Runtime version: 1.10.0
Python version: 3.8
Visual Studio version (if applicable):
GCC/Compiler version (if compiling from source):
CUDA/cuDNN version: NA
GPU model and memory: NA

To Reproduce Just save the optimized model, replace the original model with it, and run the same inference code

Expected behavior The new model should be faster

guoyu-wang commented 2 years ago

Care to share the model(s) such that we can take a look?

shadowwalker2718 commented 2 years ago

i cannot share the model...but let me try to find some other public model example :-(

microsoft / onnxruntime

slower after graph optimization! #10538