[model] Switch to hermetic built VAE FP16 model

stellaraccident commented 4 months ago

We've uploaded a hermetic built FP16 VAE to https://huggingface.co/amd-shark/sdxl-quant-models/tree/main/vae

Let's clean this up and use it by default in the pipeline.

monorimet commented 4 months ago

https://github.com/nod-ai/SHARK-Turbine/commit/e46a2a226938ba4f4b8ee23a11959db901194eb7 switches the default f16 path to use this model. Perhaps there is a better way to instantiate it -- we are essentially using the base sdxl-vae VAE config and pulling in the amd-shark quantized VAE as a state dict.

nickfraser commented 4 months ago

Don't think of this as a "quantized VAE". We've just applied some linear transformations to weights of some layers of VAE such that the internal activations do not overflow FP16. As such, the state dict should be compatible with a standard VAE config. There's no quantization parameters associated with this network.

Here's how we instantate pipeline: https://github.com/Xilinx/brevitas/blob/307b12894099caba2c9448148ef1a4ae7b405aaf/src/brevitas_examples/stable_diffusion/main.py#L174-L176
Here's the linear transformations: https://github.com/Xilinx/brevitas/blob/307b12894099caba2c9448148ef1a4ae7b405aaf/src/brevitas_examples/stable_diffusion/main.py#L475-L495

TL;DR - I think what you're doing is the correct approach.

nod-ai / SHARK-ModelDev

[model] Switch to hermetic built VAE FP16 model #769