Open jing-bao opened 3 weeks ago
Hi @jing-bao, it is definitely welcome if you can help contributing to MLAS to support relaxed SIMD for WebAssembly!
I have a question regarding the Relaxed SIMD support: if the browser/Nodejs version does not support Relexed SIMD, will it just failed to load the WebAssembly or there is still a chance to fallback to old code?
We definitely don't want it to fail when Relaxed SIMD is not supported. A possible solution in my mind:
We can test a small js+wasm code snippet to see if the browser/Nodejs supports Relaxed SIMD, like https://github.com/GoogleChromeLabs/wasm-feature-detect, then we need extra logic in onnx js code to choose between ort-wasm-relaxedsimd-threaded.wasm and ort-wasm-simd-threaded.wasm.
From your knowledge about onnx, is the right way?
I think that should work.
Describe the feature request
Wasm Relaxed SIMD includes integer dot product instructions, which will map to VNNI instructions on X86-64 platforms with AVX-VNNI (on ARM maybe SDOT, but I haven't tested), and can greatly improve the QGemm performance. And there may be more optimizations in the future if Relaxed SIMD is supported.
I have some local patches to add Wasm Relaxed SIMD build and VNNI dispatch for QGemmU8X8 to MLAS, and they improve Segment Anything Model performance to ~1.15x. Are such modifications welcome?
Describe scenario use case
Many Web models are quantized, and can benefit from the integer dot product instructions.