microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.7k stars 2.93k forks source link

[Feature Request] Add Wasm Relaxed SIMD support and integer dot product instructions for ONNX Runtime Web #22533

Open jing-bao opened 3 weeks ago

jing-bao commented 3 weeks ago

Describe the feature request

Wasm Relaxed SIMD includes integer dot product instructions, which will map to VNNI instructions on X86-64 platforms with AVX-VNNI (on ARM maybe SDOT, but I haven't tested), and can greatly improve the QGemm performance. And there may be more optimizations in the future if Relaxed SIMD is supported.

I have some local patches to add Wasm Relaxed SIMD build and VNNI dispatch for QGemmU8X8 to MLAS, and they improve Segment Anything Model performance to ~1.15x. Are such modifications welcome?

Describe scenario use case

Many Web models are quantized, and can benefit from the integer dot product instructions.

fs-eire commented 3 weeks ago

Hi @jing-bao, it is definitely welcome if you can help contributing to MLAS to support relaxed SIMD for WebAssembly!

I have a question regarding the Relaxed SIMD support: if the browser/Nodejs version does not support Relexed SIMD, will it just failed to load the WebAssembly or there is still a chance to fallback to old code?

jing-bao commented 3 weeks ago

We definitely don't want it to fail when Relaxed SIMD is not supported. A possible solution in my mind:

We can test a small js+wasm code snippet to see if the browser/Nodejs supports Relaxed SIMD, like https://github.com/GoogleChromeLabs/wasm-feature-detect, then we need extra logic in onnx js code to choose between ort-wasm-relaxedsimd-threaded.wasm and ort-wasm-simd-threaded.wasm.

From your knowledge about onnx, is the right way?

fs-eire commented 3 weeks ago

I think that should work.