got bad results: quant with 1024 image using sdxl-tubo, using 1.3.1 Normal infer, follow your steps

greasebig commented 5 months ago

i am really curious about that whether your result can be wholly reproced. using your quanted config can get good result. but when i quant sdxl-turbo from "scratch", i cannot get expected result. only vague results

A-suozhang commented 4 months ago

Apologies for the delayed response. There may be an underlying issue in your quantization process. Could you please provide more detailed information about the experimental settings (how does "from scratch" means), so that we can help you with the problem?

greasebig commented 4 months ago

here is my process: 1.1 Generate Calibration Data : CUDA_VISIBLE_DEVICES=$1 python scripts/gen_calib_data.py --config ./configs/stable-diffusion/$config_name --save_image_path ./debug_imgs 1.2 Post Training Quantization (PTQ) Process : CUDA_VISIBLE_DEVICES=$2 python scripts/ptq.py --config ./configs/stable-diffusion/${cfg_name} --outdir ./logs/$1 --seed 42 1.3 Inference Quantized Model : CUDA_VISIBLE_DEVICES=$1 python scripts/quant_txt2img.py --base_path $CKPT_PATH --batch_size 2 --num_imgs 8

after above process, i got result like this:

greasebig commented 4 months ago

then, i tried The Mixed Precision Search Process. Phase 1: PTQ : python scripts/ptq.py --config ./configs/stable-diffusion/sdxl-turbo.yaml --outdir --seed 42 Phase 2: Get Sensitivity : ... Phase 3: Integer Programming : ... Phase 4: Choose the optimal config : ... Inference with mixed precision quantized model : (i used my own optimal config got from above process) python scripts/quant_txt2img.py --base_path ./logs/sdxl-turbo-1024fp32 --config_weight_mp ./logs/sdxl-turbo-1024fp32/weight_4.73_0.96.yaml --config_act_mp ./logs/sdxl-turbo-1024fp32/act_7.50_0.95.yaml --act_protect ./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_sensitivie_a8_1%.pt --image_folder ./logs/sdxl-turbo-1024fp32/generated_images_weight_4.73_0.96_act_7.50_0.95

got result like this:

A-suozhang commented 4 months ago

here is my process: 1.1 Generate Calibration Data : CUDA_VISIBLE_DEVICES=$1 python scripts/gen_calib_data.py --config ./configs/stable-diffusion/$config_name --save_image_path ./debug_imgs 1.2 Post Training Quantization (PTQ) Process : CUDA_VISIBLE_DEVICES=$2 python scripts/ptq.py --config ./configs/stable-diffusion/${cfg_name} --outdir ./logs/$1 --seed 42 1.3 Inference Quantized Model : CUDA_VISIBLE_DEVICES=$1 python scripts/quant_txt2img.py --base_path $CKPT_PATH --batch_size 2 --num_imgs 8

after above process, i got result like this:

This process conducts uniform bit-width W8A8 quantization without mixed precision, which would produce unsatisfying result. you could try adding the --act_protect with the existing command.

greasebig commented 4 months ago

then, i tried The Mixed Precision Search Process. Phase 1: PTQ : python scripts/ptq.py --config ./configs/stable-diffusion/sdxl-turbo.yaml --outdir --seed 42 Phase 2: Get Sensitivity : ... Phase 3: Integer Programming : ... Phase 4: Choose the optimal config : ... Inference with mixed precision quantized model : (i used my own optimal config got from above process) python scripts/quant_txt2img.py --base_path ./logs/sdxl-turbo-1024fp32 --config_weight_mp ./logs/sdxl-turbo-1024fp32/weight_4.73_0.96.yaml --config_act_mp ./logs/sdxl-turbo-1024fp32/act_7.50_0.95.yaml --act_protect ./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_sensitivie_a8_1%.pt --image_folder ./logs/sdxl-turbo-1024fp32/generated_images_weight_4.73_0.96_act_7.50_0.95

got result like this:

i have tried --act_protect here

A-suozhang commented 4 months ago

then, i tried The Mixed Precision Search Process. Phase 1: PTQ : python scripts/ptq.py --config ./configs/stable-diffusion/sdxl-turbo.yaml --outdir --seed 42 Phase 2: Get Sensitivity : ... Phase 3: Integer Programming : ... Phase 4: Choose the optimal config : ... Inference with mixed precision quantized model : (i used my own optimal config got from above process) python scripts/quant_txt2img.py --base_path ./logs/sdxl-turbo-1024fp32 --config_weight_mp ./logs/sdxl-turbo-1024fp32/weight_4.73_0.96.yaml --config_act_mp ./logs/sdxl-turbo-1024fp32/act_7.50_0.95.yaml --act_protect ./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_sensitivie_a8_1%.pt --image_folder ./logs/sdxl-turbo-1024fp32/generated_images_weight_4.73_0.96_act_7.50_0.95 got result like this:

i have tried --act_protect here

Actually I meant try adding the act_protect for the uniform bit-width W8A8 command.

CUDA_VISIBLE_DEVICES=$1 python scripts/quant_txt2img.py --base_path $CKPT_PATH --batch_size 2 --num_imgs 8 --act_protect ./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_sensitivie_a8_1%.pt

A-suozhang commented 4 months ago

then, i tried The Mixed Precision Search Process. Phase 1: PTQ : python scripts/ptq.py --config ./configs/stable-diffusion/sdxl-turbo.yaml --outdir --seed 42 Phase 2: Get Sensitivity : ... Phase 3: Integer Programming : ... Phase 4: Choose the optimal config : ... Inference with mixed precision quantized model : (i used my own optimal config got from above process) python scripts/quant_txt2img.py --base_path ./logs/sdxl-turbo-1024fp32 --config_weight_mp ./logs/sdxl-turbo-1024fp32/weight_4.73_0.96.yaml --config_act_mp ./logs/sdxl-turbo-1024fp32/act_7.50_0.95.yaml --act_protect ./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_sensitivie_a8_1%.pt --image_folder ./logs/sdxl-turbo-1024fp32/generated_images_weight_4.73_0.96_act_7.50_0.95

got result like this:

This phenonmenon is probabily due to the sub-optimal mixed precision configuration. does the mixed precision searching process raise any errors?

greasebig commented 4 months ago

then, i tried The Mixed Precision Search Process. Phase 1: PTQ : python scripts/ptq.py --config ./configs/stable-diffusion/sdxl-turbo.yaml --outdir --seed 42 Phase 2: Get Sensitivity : ... Phase 3: Integer Programming : ... Phase 4: Choose the optimal config : ... Inference with mixed precision quantized model : (i used my own optimal config got from above process) python scripts/quant_txt2img.py --base_path ./logs/sdxl-turbo-1024fp32 --config_weight_mp ./logs/sdxl-turbo-1024fp32/weight_4.73_0.96.yaml --config_act_mp ./logs/sdxl-turbo-1024fp32/act_7.50_0.95.yaml --act_protect ./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_sensitivie_a8_1%.pt --image_folder ./logs/sdxl-turbo-1024fp32/generated_images_weight_4.73_0.96_act_7.50_0.95 got result like this:

i have tried --act_protect here

Actually I meant try adding the act_protect for the uniform bit-width W8A8 command.
CUDA_VISIBLE_DEVICES=$1 python scripts/quant_txt2img.py --base_path $CKPT_PATH --batch_size 2 --num_imgs 8 --act_protect ./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_sensitivie_a8_1%.pt

i need to change your code to use --act_protect if without --config_act_mp

greasebig commented 4 months ago

then, i tried The Mixed Precision Search Process. Phase 1: PTQ : python scripts/ptq.py --config ./configs/stable-diffusion/sdxl-turbo.yaml --outdir --seed 42 Phase 2: Get Sensitivity : ... Phase 3: Integer Programming : ... Phase 4: Choose the optimal config : ... Inference with mixed precision quantized model : (i used my own optimal config got from above process) python scripts/quant_txt2img.py --base_path ./logs/sdxl-turbo-1024fp32 --config_weight_mp ./logs/sdxl-turbo-1024fp32/weight_4.73_0.96.yaml --config_act_mp ./logs/sdxl-turbo-1024fp32/act_7.50_0.95.yaml --act_protect ./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_sensitivie_a8_1%.pt --image_folder ./logs/sdxl-turbo-1024fp32/generated_images_weight_4.73_0.96_act_7.50_0.95 got result like this:

This phenonmenon is probabily due to the sub-optimal mixed precision configuration. does the mixed precision searching process raise any errors?

i just followed your steps to search mixed precision config. didn't get any errors

greasebig commented 4 months ago

then, i tried The Mixed Precision Search Process. Phase 1: PTQ : python scripts/ptq.py --config ./configs/stable-diffusion/sdxl-turbo.yaml --outdir --seed 42 Phase 2: Get Sensitivity : ... Phase 3: Integer Programming : ... Phase 4: Choose the optimal config : ... Inference with mixed precision quantized model : (i used my own optimal config got from above process) python scripts/quant_txt2img.py --base_path ./logs/sdxl-turbo-1024fp32 --config_weight_mp ./logs/sdxl-turbo-1024fp32/weight_4.73_0.96.yaml --config_act_mp ./logs/sdxl-turbo-1024fp32/act_7.50_0.95.yaml --act_protect ./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_sensitivie_a8_1%.pt --image_folder ./logs/sdxl-turbo-1024fp32/generated_images_weight_4.73_0.96_act_7.50_0.95 got result like this:

i have tried --act_protect here

Actually I meant try adding the act_protect for the uniform bit-width W8A8 command.
CUDA_VISIBLE_DEVICES=$1 python scripts/quant_txt2img.py --base_path $CKPT_PATH --batch_size 2 --num_imgs 8 --act_protect ./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_sensitivie_a8_1%.pt
i need to change your code to use --act_protect if without --config_act_mp

i may try later

A-suozhang commented 4 months ago

I see. The all W8A8 quantization will generate images with visual degradation. Therefore, we identify the top 1% sensitive layers, and preserve them as FP for performance preservation. Therefore, to generate images with good quality, "act_protect" should be specified.

We design the mixed precision W8A8, which uses all 8 as W bit-width (weight_8.00.yaml), and average of 7.77 A bit, to account for the FP16 layers to make the average Activation bit-width as 8 bit.

(The command we provide as example)

# Mixed Precision Quant Inference
WEIGHT_MP_CFG="./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/weight/weight_8.00.yaml"  # [weight_5.02.yaml, weight_8.00.yaml]
ACT_MP_CFG="./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_7.77.yaml "
ACT_PROTECT="./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_sensitivie_a8_1%.pt"

If you want a full W8A8 model, simply changing all the bit-width in act_7.77.yaml as 8 should work.

greasebig commented 4 months ago

I see. The all W8A8 quantization will generate images with visual degradation. Therefore, we identify the top 1% sensitive layers, and preserve them as FP for performance preservation. Therefore, to generate images with good quality, "act_protect" should be specified.

We design the mixed precision W8A8, which uses all 8 as W bit-width (weight_8.00.yaml), and average of 7.77 A bit, to account for the FP16 layers to make the average Activation bit-width as 8 bit.

(The command we provide as example)
# Mixed Precision Quant Inference
WEIGHT_MP_CFG="./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/weight/weight_8.00.yaml"  # [weight_5.02.yaml, weight_8.00.yaml]
ACT_MP_CFG="./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_7.77.yaml "
ACT_PROTECT="./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_sensitivie_a8_1%.pt"
If you want a full W8A8 model, simply changing all the bit-width in act_7.77.yaml as 8 should work.

i know these commands. using them can produce good results. but my question is how to obtain these configs?

# Mixed Precision Quant Inference
WEIGHT_MP_CFG="./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/weight/weight_8.00.yaml"  # [weight_5.02.yaml, weight_8.00.yaml]
ACT_MP_CFG="./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_7.77.yaml "
ACT_PROTECT="./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_sensitivie_a8_1%.pt"

i tried your steps listed in https://github.com/A-suozhang/MixDQ/blob/master/mixed_precision_scripts/mixed_precision_search.md. however, only generate vague images that like what i have posted above

A-suozhang commented 4 months ago

If you simply want to conduct W8A8 quantization, you could set all bit-width in the WEIGHT_MP_CFG & ACT_MP_CFG as 8-bit with act_protect layers.

If you want to search your own mixed precision configuration. After acquiring the layer sensitivity, you may need to run multiple times for integer programming with different seed / target bit-width to generate a few candidate mixed precision configurations, and select the optimal one based on visual quality of the actual generated image.

A-suozhang commented 4 months ago

For more details of the search process, you may refer to the Appendix of our paper. Sorry for the unclear description in mixed_precision_search.md. We will revise it to make it more clear.

greasebig commented 4 months ago

For more details of the search process, you may refer to the Appendix of our paper. Sorry for the unclear description in mixed_precision_search.md. We will revise it to make it more clear.

Also, hope you can disclose more about the process of acquiring the act protect config

ACT_PROTECT="./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_sensitivie_a8_1%.pt"

in mixed_precision_search.md currently, just uses it directly but didn't show how to obtain it

A-suozhang commented 4 months ago

The act_sensitive_a8_1% is the top 1% layers according to the layer sensitivity. Specifically, we choose the top 1% layers of each group, ranked by different metric. We will supplement this part of the code in the future update.

thu-nics / MixDQ

got bad results: quant with 1024 image using sdxl-tubo, using 1.3.1 Normal infer, follow your steps #9