microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.77k stars 2.94k forks source link

Optimize Transpose around QLinearSoftmax #22849

Closed yihonglyu closed 3 days ago

yihonglyu commented 1 week ago

Description

Motivation and Context

By merging and eliminating redundant transpose , the Image Segmentation i8 model (MobileNetv2 + DeepLabv3) achieves a 2.34X speedup.