thu-nics / MixDQ

[ECCV24] MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization
https://a-suozhang.xyz/mixdq.github.io/
29 stars 3 forks source link

is MixDQ a PTQ or QAT method? #11

Open LiMa-cas opened 3 months ago

LiMa-cas commented 3 months ago

in the base_quantizier.py, there are these words:PyTorch Function that can be used for asymmetric quantization (also called uniform affine quantization). Quantizes its argument in the forward pass, passes the gradient 'straight through' on the backward pass, ignoring the quantization that occurred. Based on https://arxiv.org/abs/1806.08342., So is MixDQ a PTQ or QAT method?need backward pass when quantizationing?

A-suozhang commented 3 months ago

Thank you for your interest in our work. MixDQ is a PTQ method that does not require tuning, the code in the base_quantizer.py is simply for compatibility.

LiMa-cas commented 3 months ago

thanks a lot !

At 2024-07-20 13:00:47, "tianchen" @.***> wrote:

Thank you for your interest in our work. MixDQ is a PTQ method that does not require tuning, the code in the base_quantizer.py is simply for compatibility.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

LiMa-cas commented 3 months ago

path: "/share/public/diffusion_quant/calib_dataset/bs32_t30_sdxl.pt" HI, where can I download this file?i need all the file download

A-suozhang commented 3 months ago

You could generate this file, following the instruction of README.md step 1.1 "Generate Calibration Data"

CUDA_VISIBLE_DEVICES=$1 python scripts/gen_calib_data.py --config ./configs/stable-diffusion/$config_name --save_image_path ./debug_imgs
LiMa-cas commented 3 months ago

thanks a lot. another question, when I reference, is it much slower since I need if else to see which precision to dequantize?

A-suozhang commented 3 months ago

I'm not quite sure I fully understand your question. But Yes, the code within this repository is the "algorithm-level" quantization simulation code, and runs slower than FP16. For actual speedup, customized CUDA kernel that utilizes the INT computation is needed (our huggingface demo code, https://huggingface.co/nics-efc/MixDQ ).