Closed mak448a closed 7 months ago
Already we are using the quantized model for example OpenVINO SDXL turbo is int 8 quantized (original model is >12GB but the quantized model is <6GB). I didn't see any performance boost with this (obviously model size is less ), I have done some experimentation on this. https://huggingface.co/rupeshs/sdxl-turbo-openvino-int8
Can we get even more performance if we quantize the models first? If we can, it would be nice to add it. Thanks!