webonnx / wonnx

A WebGPU-accelerated ONNX inference run-time written 100% in Rust, ready for native and the web
Other
1.54k stars 54 forks source link

`f16` support #172

Open SludgePhD opened 1 year ago

SludgePhD commented 1 year ago

Network weights can be stored as f16 floats, halving the size of the network, which is often very desirable.

It would be nice if wonnx could support loading networks that do that. WebGPU has native support for f16 "half-precision" floats, so all GPU buffers could store them natively. Rust does not, however, so all network inputs and outputs would have to be converted.

FL33TW00D commented 1 year ago

Network weights can be stored as f16 floats, halving the size of the network, which is often very desirable.

It would be nice if wonnx could support loading networks that do that. WebGPU has native support for f16 "half-precision" floats, so all GPU buffers could store them natively. Rust does not, however, so all network inputs and outputs would have to be converted.

F16 is not supported in Naga yet: https://github.com/gfx-rs/wgpu/issues/4384 Also not shipped in Chrome yet: https://bugs.chromium.org/p/dawn/issues/detail?id=1775&q=f16&can=2

SludgePhD commented 1 year ago

Ah, that's unfortunate. In that case, wonnx could still upconvert these values to f32 when loading these models to make them work.