siliconflow / onediff

OneDiff: An out-of-the-box acceleration library for diffusion models.
https://github.com/siliconflow/onediff/wiki
Apache License 2.0
1.59k stars 99 forks source link

Quant: only small speedups on A100 #761

Closed sandor-lisn closed 5 months ago

sandor-lisn commented 5 months ago

Describe the bug

A clear and concise description of what the bug is.

using quant only makes minimal speedups on A100

Your environment

OS

$ uname -a Linux jean-zay4 4.18.0-372.91.1.el8_6.x86_64 #1 SMP Tue Jan 30 11:06:32 EST 2024 x86_64 x86_64 x86_64 GNU/Linux

OneDiff git commit id

85840be35d24d7b06efc9028b6322b23845ab1fe

OneFlow version info

path: ['/gpfswork/rech/xyi/utb83nr/miniconda3/envs/onediff/lib/python3.10/site-packages/oneflow'] version: 0.9.1.dev20240322+cu121 git_commit: 0523481 cmake_build_type: Release rdma: True mlir: True enterprise: True

How To Reproduce

quant.json non-quant.json

The complete error message

N/A

Additional context

N/A

strint commented 5 months ago

what quantized model are you using from here https://huggingface.co/collections/siliconflow/onediff-enterprise-for-comfyui-65aa4d455860f06ff2808f05? @sandor-lisn

sandor-lisn commented 5 months ago

I was trying to follow your instructions here: https://huggingface.co/siliconflow/sdxl-base-1.0-onediff-comfy-enterprise-v1

I put https://huggingface.co/siliconflow/sdxl-base-1.0-onediff-comfy-enterprise-v1/resolve/main/sd_xl_base_1.0_quantize_info.pt?download=true into ComfyUI/models/onediff_quant

As checkpoint I used sd_xl_base_1.0_0.9vae.safetensors loaded from ComfyUI/models/checkpoints

hjchen2 commented 5 months ago

The comfy enterprise v1 model is quantized with deepcache. Since deepcache leads to a decrease in quality, in order to ensure quality, only part of the linear modules are quantized in this model, and the conv modules are not quantized.

~You can try to use comfy's quantization workflow to quantize a sdxl-base model without deepcache yourself. Please refer to https://github.com/siliconflow/onediff/tree/main/onediff_comfy_nodes#quantization~

@doombeaker Can you provide a comfy quantization workflow for sandor?

doombeaker commented 5 months ago

@sandor-lisn

here is the quantization workflow, you can quantize the model by yourself and set the related hype parameters below as you need.

The "Quant K Sampler" is the key node contains almost all the hype parameters.

Parameter name Description Type Default Value
bits Number of bits INT 8
quantize_conv Whether to quantize the convolution layer STRING enable
quantize_linear Whether to quantize the linear layer STRING enable
conv_mse_threshold MSE threshold for quantizing vs not quantizing this convolution layer FLOAT 0.1
linear_mse_threshold MSE threshold for quantizing vs not quantizing this linear layer FLOAT 0.1
compute_density_threshold When the computation density of this layer is below this threshold, do not quantize INT 0
save_filename_prefix Prefix of the configuration file name for the saved quantized model STRING unet
overwrite Whether to overwrite the existing quantization configuration file STRING enable
static_mode Use oneflow_compile when calculating output STRING enable

The quantization configuration file will be saved in ComfyUI/models/onediff_quant/ mse_threshold: the smaller the mse, the more similar they are Comfy's new quantization tool takes about 35 minutes to quantize an sdxl. Image: 1024x1024, step: 20, NVIDIA A100-PCIE-40GB.

hjchen2 commented 5 months ago

@sandor-lisn

here is the quantization workflow, you can quantize the model by yourself and set the related hype parameters below as you need.

The "Quant K Sampler" is the key node contains almost all the hype parameters.

Parameter name Description Type Default Value bits Number of bits INT 8 quantize_conv Whether to quantize the convolution layer STRING enable quantize_linear Whether to quantize the linear layer STRING enable conv_mse_threshold MSE threshold for quantizing vs not quantizing this convolution layer FLOAT 0.1 linear_mse_threshold MSE threshold for quantizing vs not quantizing this linear layer FLOAT 0.1 compute_density_threshold When the computation density of this layer is below this threshold, do not quantize INT 0 save_filename_prefix Prefix of the configuration file name for the saved quantized model STRING unet overwrite Whether to overwrite the existing quantization configuration file STRING enable static_mode Use oneflow_compile when calculating output STRING enable The quantization configuration file will be saved in ComfyUI/models/onediff_quant/ mse_threshold: the smaller the mse, the more similar they are Comfy's new quantization tool takes about 35 minutes to quantize an sdxl. Image: 1024x1024, step: 20, NVIDIA A100-PCIE-40GB.

And the DeepCache SpeedUp node should be disable in this workflow. The recommended value of compute_density_threshold is 300, and default values ​​for other parameters.

sandor-lisn commented 5 months ago

Thank you very much for your detailled answers!

I think that I understand a little better now, but I am still confused... Is there any documentation about the interplay of deepcache, quantization, and onediff? It would be very helpful for me to better understand the bigger picture.

A problem for me is that I can't run the workflow that doombeaker posted above. I have big problems running deepcache, see my other open issue: https://github.com/siliconflow/onediff/issues/763

strint commented 5 months ago

too old to follow, please free to reopen it if the problem still exists. @sandor-lisn