Closed greasebig closed 5 months ago
in your paper you say it can be used in w4a8
# with open(w_config, 'r') as input_file:
if w_bit == 8:
mod_name_to_weight_width = w8_uniform_config
else:
raise RuntimeError("we only support int8 quantization")
# filter 'model.' from all names
I may need further clarification to fully grasp your question. Is the above code containing "we only support int 8 quantization"
from our huggingface demo?
The algorithm-level quantization simulation code (in github) supports mixed precision (including W4A8), the system-level quantization code (in huggingface, including the cuda kernel) only supports W8A8 for now, we are still working on the mixed precision CUDA kernel implementation.
yes, my above pasted code from https://huggingface.co/nics-efc/MixDQ/tree/main
thanks for your reply. i have another question, can your W8A8 be used in webui or comfyui?
Currently, our code is built upon the huggingface diffusers package, as a customized pipeline. If webui or ComfyUI could support embedding diffusers pipeline (I know that ComfyUI support some of the diffusers model), then our code could be directly used.
in pipeline.py
it says : This function helps quantize the UNet in the SDXL Pipeline Now we only support quantization with the setting W8A8