Open greasebig opened 5 months ago
Apologies for the delayed response. There may be an underlying issue in your quantization process. Could you please provide more detailed information about the experimental settings (how does "from scratch" means), so that we can help you with the problem?
here is my process: 1.1 Generate Calibration Data : CUDA_VISIBLE_DEVICES=$1 python scripts/gen_calib_data.py --config ./configs/stable-diffusion/$config_name --save_image_path ./debug_imgs 1.2 Post Training Quantization (PTQ) Process : CUDA_VISIBLE_DEVICES=$2 python scripts/ptq.py --config ./configs/stable-diffusion/${cfg_name} --outdir ./logs/$1 --seed 42 1.3 Inference Quantized Model : CUDA_VISIBLE_DEVICES=$1 python scripts/quant_txt2img.py --base_path $CKPT_PATH --batch_size 2 --num_imgs 8
after above process, i got result like this:
then, i tried The Mixed Precision Search Process.
Phase 1: PTQ : python scripts/ptq.py --config ./configs/stable-diffusion/sdxl-turbo.yaml --outdir
got result like this:
here is my process: 1.1 Generate Calibration Data : CUDA_VISIBLE_DEVICES=$1 python scripts/gen_calib_data.py --config ./configs/stable-diffusion/$config_name --save_image_path ./debug_imgs 1.2 Post Training Quantization (PTQ) Process : CUDA_VISIBLE_DEVICES=$2 python scripts/ptq.py --config ./configs/stable-diffusion/${cfg_name} --outdir ./logs/$1 --seed 42 1.3 Inference Quantized Model : CUDA_VISIBLE_DEVICES=$1 python scripts/quant_txt2img.py --base_path $CKPT_PATH --batch_size 2 --num_imgs 8
after above process, i got result like this:
This process conducts uniform bit-width W8A8 quantization without mixed precision, which would produce unsatisfying result. you could try adding the --act_protect
with the existing command.
then, i tried The Mixed Precision Search Process. Phase 1: PTQ : python scripts/ptq.py --config ./configs/stable-diffusion/sdxl-turbo.yaml --outdir --seed 42 Phase 2: Get Sensitivity : ... Phase 3: Integer Programming : ... Phase 4: Choose the optimal config : ... Inference with mixed precision quantized model : (i used my own optimal config got from above process) python scripts/quant_txt2img.py --base_path ./logs/sdxl-turbo-1024fp32 --config_weight_mp ./logs/sdxl-turbo-1024fp32/weight_4.73_0.96.yaml --config_act_mp ./logs/sdxl-turbo-1024fp32/act_7.50_0.95.yaml --act_protect ./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_sensitivie_a8_1%.pt --image_folder ./logs/sdxl-turbo-1024fp32/generated_images_weight_4.73_0.96_act_7.50_0.95
got result like this:
i have tried --act_protect here
then, i tried The Mixed Precision Search Process. Phase 1: PTQ : python scripts/ptq.py --config ./configs/stable-diffusion/sdxl-turbo.yaml --outdir --seed 42 Phase 2: Get Sensitivity : ... Phase 3: Integer Programming : ... Phase 4: Choose the optimal config : ... Inference with mixed precision quantized model : (i used my own optimal config got from above process) python scripts/quant_txt2img.py --base_path ./logs/sdxl-turbo-1024fp32 --config_weight_mp ./logs/sdxl-turbo-1024fp32/weight_4.73_0.96.yaml --config_act_mp ./logs/sdxl-turbo-1024fp32/act_7.50_0.95.yaml --act_protect ./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_sensitivie_a8_1%.pt --image_folder ./logs/sdxl-turbo-1024fp32/generated_images_weight_4.73_0.96_act_7.50_0.95 got result like this:
i have tried --act_protect here
Actually I meant try adding the act_protect for the uniform bit-width W8A8 command.
CUDA_VISIBLE_DEVICES=$1 python scripts/quant_txt2img.py --base_path $CKPT_PATH --batch_size 2 --num_imgs 8 --act_protect ./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_sensitivie_a8_1%.pt
then, i tried The Mixed Precision Search Process. Phase 1: PTQ : python scripts/ptq.py --config ./configs/stable-diffusion/sdxl-turbo.yaml --outdir --seed 42 Phase 2: Get Sensitivity : ... Phase 3: Integer Programming : ... Phase 4: Choose the optimal config : ... Inference with mixed precision quantized model : (i used my own optimal config got from above process) python scripts/quant_txt2img.py --base_path ./logs/sdxl-turbo-1024fp32 --config_weight_mp ./logs/sdxl-turbo-1024fp32/weight_4.73_0.96.yaml --config_act_mp ./logs/sdxl-turbo-1024fp32/act_7.50_0.95.yaml --act_protect ./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_sensitivie_a8_1%.pt --image_folder ./logs/sdxl-turbo-1024fp32/generated_images_weight_4.73_0.96_act_7.50_0.95
got result like this:
This phenonmenon is probabily due to the sub-optimal mixed precision configuration. does the mixed precision searching process raise any errors?
then, i tried The Mixed Precision Search Process. Phase 1: PTQ : python scripts/ptq.py --config ./configs/stable-diffusion/sdxl-turbo.yaml --outdir --seed 42 Phase 2: Get Sensitivity : ... Phase 3: Integer Programming : ... Phase 4: Choose the optimal config : ... Inference with mixed precision quantized model : (i used my own optimal config got from above process) python scripts/quant_txt2img.py --base_path ./logs/sdxl-turbo-1024fp32 --config_weight_mp ./logs/sdxl-turbo-1024fp32/weight_4.73_0.96.yaml --config_act_mp ./logs/sdxl-turbo-1024fp32/act_7.50_0.95.yaml --act_protect ./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_sensitivie_a8_1%.pt --image_folder ./logs/sdxl-turbo-1024fp32/generated_images_weight_4.73_0.96_act_7.50_0.95 got result like this:
i have tried --act_protect here
Actually I meant try adding the act_protect for the uniform bit-width W8A8 command.
CUDA_VISIBLE_DEVICES=$1 python scripts/quant_txt2img.py --base_path $CKPT_PATH --batch_size 2 --num_imgs 8 --act_protect ./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_sensitivie_a8_1%.pt
i need to change your code to use --act_protect if without --config_act_mp
then, i tried The Mixed Precision Search Process. Phase 1: PTQ : python scripts/ptq.py --config ./configs/stable-diffusion/sdxl-turbo.yaml --outdir --seed 42 Phase 2: Get Sensitivity : ... Phase 3: Integer Programming : ... Phase 4: Choose the optimal config : ... Inference with mixed precision quantized model : (i used my own optimal config got from above process) python scripts/quant_txt2img.py --base_path ./logs/sdxl-turbo-1024fp32 --config_weight_mp ./logs/sdxl-turbo-1024fp32/weight_4.73_0.96.yaml --config_act_mp ./logs/sdxl-turbo-1024fp32/act_7.50_0.95.yaml --act_protect ./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_sensitivie_a8_1%.pt --image_folder ./logs/sdxl-turbo-1024fp32/generated_images_weight_4.73_0.96_act_7.50_0.95 got result like this:
This phenonmenon is probabily due to the sub-optimal mixed precision configuration. does the mixed precision searching process raise any errors?
i just followed your steps to search mixed precision config. didn't get any errors
then, i tried The Mixed Precision Search Process. Phase 1: PTQ : python scripts/ptq.py --config ./configs/stable-diffusion/sdxl-turbo.yaml --outdir --seed 42 Phase 2: Get Sensitivity : ... Phase 3: Integer Programming : ... Phase 4: Choose the optimal config : ... Inference with mixed precision quantized model : (i used my own optimal config got from above process) python scripts/quant_txt2img.py --base_path ./logs/sdxl-turbo-1024fp32 --config_weight_mp ./logs/sdxl-turbo-1024fp32/weight_4.73_0.96.yaml --config_act_mp ./logs/sdxl-turbo-1024fp32/act_7.50_0.95.yaml --act_protect ./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_sensitivie_a8_1%.pt --image_folder ./logs/sdxl-turbo-1024fp32/generated_images_weight_4.73_0.96_act_7.50_0.95 got result like this:
i have tried --act_protect here
Actually I meant try adding the act_protect for the uniform bit-width W8A8 command.
CUDA_VISIBLE_DEVICES=$1 python scripts/quant_txt2img.py --base_path $CKPT_PATH --batch_size 2 --num_imgs 8 --act_protect ./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_sensitivie_a8_1%.pt
i need to change your code to use --act_protect if without --config_act_mp
i may try later
I see. The all W8A8 quantization will generate images with visual degradation. Therefore, we identify the top 1% sensitive layers, and preserve them as FP for performance preservation. Therefore, to generate images with good quality, "act_protect" should be specified.
We design the mixed precision W8A8, which uses all 8 as W bit-width (weight_8.00.yaml), and average of 7.77 A bit, to account for the FP16 layers to make the average Activation bit-width as 8 bit.
(The command we provide as example)
# Mixed Precision Quant Inference
WEIGHT_MP_CFG="./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/weight/weight_8.00.yaml" # [weight_5.02.yaml, weight_8.00.yaml]
ACT_MP_CFG="./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_7.77.yaml "
ACT_PROTECT="./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_sensitivie_a8_1%.pt"
If you want a full W8A8 model, simply changing all the bit-width in act_7.77.yaml as 8 should work.
I see. The all W8A8 quantization will generate images with visual degradation. Therefore, we identify the top 1% sensitive layers, and preserve them as FP for performance preservation. Therefore, to generate images with good quality, "act_protect" should be specified.
We design the mixed precision W8A8, which uses all 8 as W bit-width (weight_8.00.yaml), and average of 7.77 A bit, to account for the FP16 layers to make the average Activation bit-width as 8 bit.
(The command we provide as example)
# Mixed Precision Quant Inference WEIGHT_MP_CFG="./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/weight/weight_8.00.yaml" # [weight_5.02.yaml, weight_8.00.yaml] ACT_MP_CFG="./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_7.77.yaml " ACT_PROTECT="./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_sensitivie_a8_1%.pt"
If you want a full W8A8 model, simply changing all the bit-width in act_7.77.yaml as 8 should work.
i know these commands. using them can produce good results. but my question is how to obtain these configs?
# Mixed Precision Quant Inference
WEIGHT_MP_CFG="./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/weight/weight_8.00.yaml" # [weight_5.02.yaml, weight_8.00.yaml]
ACT_MP_CFG="./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_7.77.yaml "
ACT_PROTECT="./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_sensitivie_a8_1%.pt"
i tried your steps listed in https://github.com/A-suozhang/MixDQ/blob/master/mixed_precision_scripts/mixed_precision_search.md. however, only generate vague images that like what i have posted above
If you simply want to conduct W8A8 quantization, you could set all bit-width in the WEIGHT_MP_CFG & ACT_MP_CFG as 8-bit with act_protect layers.
If you want to search your own mixed precision configuration. After acquiring the layer sensitivity, you may need to run multiple times for integer programming with different seed / target bit-width to generate a few candidate mixed precision configurations, and select the optimal one based on visual quality of the actual generated image.
For more details of the search process, you may refer to the Appendix of our paper. Sorry for the unclear description in mixed_precision_search.md
. We will revise it to make it more clear.
For more details of the search process, you may refer to the Appendix of our paper. Sorry for the unclear description in
mixed_precision_search.md
. We will revise it to make it more clear.
Also, hope you can disclose more about the process of acquiring the act protect config
ACT_PROTECT="./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_sensitivie_a8_1%.pt"
in mixed_precision_search.md
currently, just uses it directly but didn't show how to obtain it
The act_sensitive_a8_1%
is the top 1% layers according to the layer sensitivity. Specifically, we choose the top 1% layers of each group, ranked by different metric. We will supplement this part of the code in the future update.
i am really curious about that whether your result can be wholly reproced. using your quanted config can get good result. but when i quant sdxl-turbo from "scratch", i cannot get expected result. only vague results