vitoplantamura / OnnxStream

Lightweight inference library for ONNX files, written in C++. It can run Stable Diffusion XL 1.0 on a RPI Zero 2 (or in 298MB of RAM) but also Mistral 7B on desktops and servers. ARM, x86, WASM, RISC-V supported. Accelerated by XNNPACK.
https://yolo.vitoplantamura.com/
Other
1.86k stars 84 forks source link

SDXL Turbo support #46

Closed AeroX2 closed 11 months ago

AeroX2 commented 11 months ago

Adding SDXL Turbo support!

The turbo mode can be enabled with the --turbo flag and all the updated weights can also be found here: https://huggingface.co/AeroX2/stable-diffusion-xl-turbo-1.0-onnxstream/tree/main The model can easily do 1 step inference and has some impressive results.

I've done my best to try and match the formatting as much as possible but let me know if you want some stuff changed.

AeroX2 commented 11 months ago

Needs some work on adding the tiled decoder, I thought I got it working but it was just silently failing and showing the previous result :/

vitoplantamura commented 11 months ago

Perfect! I did some quick tests on my PC: tiled decoding works too!

A detail: on HF, the "sdxl_vae_decoder_32x32_fp16" directory, required by the tiled decoder, is missing.

Tomorrow I'll try it on the RPI Zero 2.

Thanks, Vito

vitoplantamura commented 11 months ago

I was cloning the HF models on my RPI Zero, and the SD card exploded :-)

The 2 models sdxl_vae_decoder_fp16 and sdxl_unet_fp16 are in fp32 precision.

Could you convert them with onnx2txt by specifying CONVERT_TO_FP16 = True?

In the end, the size of the weights will be half and the inference should be faster.

Thanks, Vito

AeroX2 commented 11 months ago

Sure thing, just added a new commit on huggingface, let me know how it goes.

vitoplantamura commented 11 months ago

perfect thanks!

Can I ask you to delete the two fp32 versions? (ie sdxl_vae_decoder_fp32 and sdxl_unet_fp32)

They are not used in any code path and the "git clone" is faster and takes up much less disk space for end users.

Thanks, Vito