Closed sandor-lisn closed 5 months ago
what quantized model are you using from here https://huggingface.co/collections/siliconflow/onediff-enterprise-for-comfyui-65aa4d455860f06ff2808f05? @sandor-lisn
I was trying to follow your instructions here: https://huggingface.co/siliconflow/sdxl-base-1.0-onediff-comfy-enterprise-v1
I put https://huggingface.co/siliconflow/sdxl-base-1.0-onediff-comfy-enterprise-v1/resolve/main/sd_xl_base_1.0_quantize_info.pt?download=true into ComfyUI/models/onediff_quant
As checkpoint I used sd_xl_base_1.0_0.9vae.safetensors loaded from ComfyUI/models/checkpoints
The comfy enterprise v1 model is quantized with deepcache. Since deepcache leads to a decrease in quality, in order to ensure quality, only part of the linear modules are quantized in this model, and the conv modules are not quantized.
~You can try to use comfy's quantization workflow to quantize a sdxl-base model without deepcache yourself. Please refer to https://github.com/siliconflow/onediff/tree/main/onediff_comfy_nodes#quantization~
@doombeaker Can you provide a comfy quantization workflow for sandor?
@sandor-lisn
here is the quantization workflow, you can quantize the model by yourself and set the related hype parameters below as you need.
The "Quant K Sampler" is the key node contains almost all the hype parameters.
Parameter name | Description | Type | Default Value |
---|---|---|---|
bits |
Number of bits | INT |
8 |
quantize_conv |
Whether to quantize the convolution layer | STRING |
enable |
quantize_linear |
Whether to quantize the linear layer | STRING |
enable |
conv_mse_threshold |
MSE threshold for quantizing vs not quantizing this convolution layer | FLOAT |
0.1 |
linear_mse_threshold |
MSE threshold for quantizing vs not quantizing this linear layer | FLOAT |
0.1 |
compute_density_threshold |
When the computation density of this layer is below this threshold, do not quantize | INT |
0 |
save_filename_prefix |
Prefix of the configuration file name for the saved quantized model | STRING |
unet |
overwrite |
Whether to overwrite the existing quantization configuration file | STRING |
enable |
static_mode |
Use oneflow_compile when calculating output | STRING |
enable |
The quantization configuration file will be saved in ComfyUI/models/onediff_quant/
mse_threshold: the smaller the mse, the more similar they are
Comfy's new quantization tool takes about 35 minutes to quantize an sdxl. Image: 1024x1024, step: 20, NVIDIA A100-PCIE-40GB.
@sandor-lisn
here is the quantization workflow, you can quantize the model by yourself and set the related hype parameters below as you need.
The "Quant K Sampler" is the key node contains almost all the hype parameters.
Parameter name Description Type Default Value
bits
Number of bitsINT
8
quantize_conv
Whether to quantize the convolution layerSTRING
enable
quantize_linear
Whether to quantize the linear layerSTRING
enable
conv_mse_threshold
MSE threshold for quantizing vs not quantizing this convolution layerFLOAT
0.1
linear_mse_threshold
MSE threshold for quantizing vs not quantizing this linear layerFLOAT
0.1
compute_density_threshold
When the computation density of this layer is below this threshold, do not quantizeINT
0
save_filename_prefix
Prefix of the configuration file name for the saved quantized modelSTRING
unet
overwrite
Whether to overwrite the existing quantization configuration fileSTRING
enable
static_mode
Use oneflow_compile when calculating outputSTRING
enable
The quantization configuration file will be saved inComfyUI/models/onediff_quant/
mse_threshold: the smaller the mse, the more similar they are Comfy's new quantization tool takes about 35 minutes to quantize an sdxl. Image: 1024x1024, step: 20, NVIDIA A100-PCIE-40GB.
And the DeepCache SpeedUp node should be disable in this workflow. The recommended value of compute_density_threshold
is 300, and default values for other parameters.
Thank you very much for your detailled answers!
I think that I understand a little better now, but I am still confused... Is there any documentation about the interplay of deepcache, quantization, and onediff? It would be very helpful for me to better understand the bigger picture.
A problem for me is that I can't run the workflow that doombeaker posted above. I have big problems running deepcache, see my other open issue: https://github.com/siliconflow/onediff/issues/763
too old to follow, please free to reopen it if the problem still exists. @sandor-lisn
Describe the bug
A clear and concise description of what the bug is.
using quant only makes minimal speedups on A100
Your environment
OS
$ uname -a Linux jean-zay4 4.18.0-372.91.1.el8_6.x86_64 #1 SMP Tue Jan 30 11:06:32 EST 2024 x86_64 x86_64 x86_64 GNU/Linux
OneDiff git commit id
85840be35d24d7b06efc9028b6322b23845ab1fe
OneFlow version info
path: ['/gpfswork/rech/xyi/utb83nr/miniconda3/envs/onediff/lib/python3.10/site-packages/oneflow'] version: 0.9.1.dev20240322+cu121 git_commit: 0523481 cmake_build_type: Release rdma: True mlir: True enterprise: True
How To Reproduce
quant.json non-quant.json
The complete error message
N/A
Additional context
N/A