vitoplantamura / OnnxStream

Lightweight inference library for ONNX files, written in C++. It can run Stable Diffusion XL 1.0 on a RPI Zero 2 (or in 298MB of RAM) but also Mistral 7B on desktops and servers. ARM, x86, WASM, RISC-V supported. Accelerated by XNNPACK.
https://yolo.vitoplantamura.com/
Other
1.86k stars 84 forks source link

FP16 mode is always enabled. #3

Closed g8kig closed 1 year ago

g8kig commented 1 year ago

I think sd.cpp line 714 should be:

        model.m_use_fp16_arithmetic = false;

So that that the --fp16 flag can enable it (sd.cpp line 599)

vitoplantamura commented 1 year ago

hi,

Line 714 refers to the UNET model, while line 599 refers to the VAE decoder.

For the UNET, the idea is to always use FP16 arithmetic, unless the user specifies the --rpi command line option. The problem is that XnnPack doesn't support FP16 arithmetic on the Raspberry, so the UNET must run in FP32 precision on a RPI 4 or Zero2.

For the decoder, the problem is different. Two versions of the model are provided: the UINT8 version and the FP16 version. The implementation always uses the UINT8 version, unless the user specifies the --decoder-fp16 option. The FP16 version generates better images than the UINT8 version but the FP16 version cannot run on a RPI. So the --decoder-fp16 option is intended to give the user the ability to generate images at the best possible quality if their hardware is capable of doing so.

Vito

g8kig commented 1 year ago

Ah yes got it! Thank you.