microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.8k stars 2.94k forks source link

[Web] External data support missing in ORTW - needed for 2GB+ models like SD1.5, SDXL, and Dolly #17151

Closed fdwr closed 6 months ago

fdwr commented 1 year ago

Describe the issue

ONNX supports external data files for weights to exceed the 2GB ProtoBuf limit (e.g. model.onnx + weights.onnxdata), but ORTW doesn't support this, limiting it to smaller models (e.g. no SDXL or Dolly support). The kinks are being ironed out for WASM Memory64 in Emscripten and Chromium to support 4GB+ memory space, but without external data support, ORT won't be able to load these models. Note the .ort format has the same 2GB issue due to the FlatBuffers size limit.

To reproduce

Try to load any 2GB+ model, like SDXL or even Stable Diffusion 1.5 float32 with embedded weights (SD with float16 just nearly fits at 1.7GB) using ONNX Runtime for the Web.

Urgency

Sometime after November.

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.15

Execution Provider

'webgpu' (WebGPU) + webnn

xenova commented 1 year ago

This would be amazing! 🥳 Something desperately needed (and requested) for Transformers.js.

Feel free to test with any of the models I've already converted and put on the hub: https://huggingface.co/models?library=transformers.js&sort=trending. I have quite a few models with the external format; here are some popular ones which could help with development:

And of course the Llama-2-Onnx repo which I'm sure you're already aware of 😉

dakenf commented 1 year ago

The kinks are being ironed out for WASM Memory64 in Emscripten and Chromium to support 4GB+ memory space

Hehe, you're welcome 😎 Chrome canary 118.0.5951.0 already ships with my fixes for 64bit memory and wasm threads

I've made a PR to load weights, but it needs some thoughts on overall implementation and another fix for emscripten because WASM MEMFS don't support files >2gb as they use ArrayBuffer (which has a 2gb limitation). And my hack with substituting internal file contents with WebAssembly.Memory instance does not work in release build because those fields and methods are obfuscated https://github.com/microsoft/onnxruntime/pull/17155

However, it's a bit useless until 64bit build support is merged as wasm will crash with out of memory on big models but I'll iron that out in next few weeks

guschmue commented 6 months ago

fixed since ort-1.17.