Optimize Transpose around QLinearSoftmax

microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

https://onnxruntime.ai

MIT License

14.77k stars 2.94k forks source link

Closed yihonglyu closed 3 days ago

yihonglyu commented 1 week ago

By merging and eliminating redundant transpose , the Image Segmentation i8 model (MobileNetv2 + DeepLabv3) achieves a 2.34X speedup.