pytorch / executorch

On-device AI across mobile, embedded and edge for PyTorch
https://pytorch.org/executorch/
Other
1.31k stars 206 forks source link

WebAssembly / Web runtime (both for wasm-simd and WebGPU) #3497

Open vadimkantorov opened 2 weeks ago

vadimkantorov commented 2 weeks ago

I'm wondering if ExecuTorch can be compiled for WebAssembly target? As far as I understand, XNNPACK exists for wasm-simd, so theoretically at least for CPU it can be done? (e.g. to be compared with tflite+tfjs, ort-web and tvm-wasm at least for some popular models like MobileNets)

(This is especially interesting if strong fusion/codegen can be done to produce fused wasm-simd code/fused WebGPU programs - although maybe this is an ask for Inductor)

SS-JIA commented 2 weeks ago

cc: @mcr229 or @digantdesai regarding running XNNPACK via wasm

SS-JIA commented 2 weeks ago

Also cc: @mergennachin

JacobSzwejbka commented 1 week ago

I've talked with @digantdesai about this before. I think for xnnpack he mentioned it should just be plug and play. Ive been wanting to try out wasm for sometime now just havent had the bandwidth.

vadimkantorov commented 1 week ago

I also wonder about the fusion capabilities of executorch :) Does it allow Inductor codegen'd fused kernels (e.g. think quant/dequant fused into the flash attn kernel directly, with positional embedding computation also fused into this kernel)?

Another interesting backend is webgpu/wgpu: https://github.com/huggingface/ratchet or even directly wgpu/wgsl shaders could in theory be a compilation target for fused kernels

But even if executorch does not support wild codegen/fusions - it's still be good to have it as a baseline with comparisons against ort-web and tflate-tfjs and tvm-wasm and ggml compiled to wasm. This should show roughly where all these frameworks stand (especially if compiling is relatively doable)

vadimkantorov commented 1 week ago

And given that currently PyTorch does not have its own inference wasm/WebGPU story, having executorch compiled to wasm-simd might be a nice baseline to have (especially if it's minimalistic and relatively simple to compile)