Open tk4218 opened 1 year ago
After some further testing, I manually updated my model to match the IR and Opset version of the arcfaceresent100-8 model (IR version 3, Opset version 8), and that seems to have resolved the node differences in the profiles. I'm now seeing 51 ReorderInput/ReorderOutput, 152 Conv, 2 BatchNormalization, etc.
It is pretty clear now that the Conv and PRelu execution times are what is causing my model to be slow, however I still don't see any differences in those with the other model. One thing to note is that the weights of my model are significantly smaller (for example, -4.930378685479061e-25 vs. 0.00033268501283600926), but both are float32.
Not sure if the weight values could cause any slowdown, but struggling to find the differences in my Conv/PRelu nodes that could cause the slowdown.
I have a arcface/resnet100 model that I've trained using InsightFace's MxNet training. For inference, I have converted the model to ONNX with the help of https://github.com/linghu8812/tensorrt_inference/blob/master/project/arcface/export_onnx.py.
The inference results are correct on my converted model, however speed of the model is extremely slow. For reference, I compared with the arcface model provided in this repository (arcfaceresnet100-8.onnx). Inference when running my model takes ~7 seconds, whereas the other model takes < 1 second.
When comparing the two models in Netron, all of the nodes, attributes, input/output shapes are the same (weights are different, obviously), however when I run the onnx profiler on the two models, there are a few differences. I've attached the profile logs for both models.
profile_arcfaceresnet100-8.txt profile_model-opt.txt
There are a few differences in the two logs. Mainly (mine vs. arcfaceresnet100-8):
I am not sure what the differences are when converting. It is critical that I get my converted model to run with a similar performance as the arcfaceresnet100-8 model. I've tried running my model through simplifiers/optimizers/etc., but with no improvement.
Here are my environment details:
OS: Linux Ubuntu Server 20.04 Python: 3.8
MxNet version: 1.9.1 ONNXRuntime: 1.14.0 ONNX 1.13.0 ONNX IR Version: 8 ONNX Opset Version: 18
If anyone could provide insight as to why my model performs slower or why there are differences in execution, that would be extremely helpful.