Closed stellaraccident closed 4 months ago
https://github.com/nod-ai/SHARK-Turbine/commit/e46a2a226938ba4f4b8ee23a11959db901194eb7 switches the default f16 path to use this model. Perhaps there is a better way to instantiate it -- we are essentially using the base sdxl-vae VAE config and pulling in the amd-shark quantized VAE as a state dict.
Don't think of this as a "quantized VAE". We've just applied some linear transformations to weights of some layers of VAE such that the internal activations do not overflow FP16. As such, the state dict should be compatible with a standard VAE config. There's no quantization parameters associated with this network.
TL;DR - I think what you're doing is the correct approach.
We've uploaded a hermetic built FP16 VAE to https://huggingface.co/amd-shark/sdxl-quant-models/tree/main/vae
Let's clean this up and use it by default in the pipeline.