Open Pierrick-Pochelu opened 2 years ago
Your observations are correct, we do not have to do anything during loading/initialization because all of the constants, for example, are loaded directly from the binary. We currently don't optimize conv ops, which is the predominant performance component of ResNet and other conv based networks. We are working on it. If you are interested in contributing to the performance of ONNX-MLIR, we welcome the contributions from all.
Hi @AlexandreEichenberger, Thanks for sharing those details.
I was recently exploring this project and tested the inference of the generated model. I tested efficientnet_b0
from torchvision
.
On my machine which is old Intel i3 5th Gen,
122 ms ± 32.3 ms per loop (mean ± std. dev. of 20 runs, 10 loops each)
64.2 ms ± 10.5 ms per loop (mean ± std. dev. of 20 runs, 10 loops each)
1.2 s ± 55 ms per loop (mean ± std. dev. of 20 runs, 10 loops each)
I'm wondering if I'm doing something wrong or this is the expected inference speed on the mentioned model?
On what kind of models is this project supposed to be used to get the faster inference?
Thanks
Hello, I start with a ResNet50 onnx file or a VGG19 ONNX file. I compile it to generate a SO file. Everything is working fine.
Now I want perform two things : 1) load/initialize the model in memory and 2) Fast predict with it when data samples are incoming.
I implement it copy/pasting the doc but I am hurting critical performance issue. The initiIialization step is very fast (too much <0.01sec), and the prediction is very slow compared to other inference framework (several minutes). I suspect ExecutionSession constructor to not loading and initializing the model, and the run method to load and run it instead of only predict.
How can I load/initialize once and perform fast predictions each time new data are incoming ?