Quant: only small speedups on A100

sandor-lisn commented 5 months ago

Describe the bug

A clear and concise description of what the bug is.

using quant only makes minimal speedups on A100

Your environment

OS

$ uname -a Linux jean-zay4 4.18.0-372.91.1.el8_6.x86_64 #1 SMP Tue Jan 30 11:06:32 EST 2024 x86_64 x86_64 x86_64 GNU/Linux

OneDiff git commit id

85840be35d24d7b06efc9028b6322b23845ab1fe

OneFlow version info

path: ['/gpfswork/rech/xyi/utb83nr/miniconda3/envs/onediff/lib/python3.10/site-packages/oneflow'] version: 0.9.1.dev20240322+cu121 git_commit: 0523481 cmake_build_type: Release rdma: True mlir: True enterprise: True

How To Reproduce

run workflow non-quant.json: takes 1.8 sec on A100
run workflow quant.json: 1.58 sec on A100

quant.json non-quant.json

The complete error message

N/A

Additional context

N/A

strint commented 5 months ago

what quantized model are you using from here https://huggingface.co/collections/siliconflow/onediff-enterprise-for-comfyui-65aa4d455860f06ff2808f05? @sandor-lisn

sandor-lisn commented 5 months ago

I was trying to follow your instructions here: https://huggingface.co/siliconflow/sdxl-base-1.0-onediff-comfy-enterprise-v1

I put https://huggingface.co/siliconflow/sdxl-base-1.0-onediff-comfy-enterprise-v1/resolve/main/sd_xl_base_1.0_quantize_info.pt?download=true into ComfyUI/models/onediff_quant

As checkpoint I used sd_xl_base_1.0_0.9vae.safetensors loaded from ComfyUI/models/checkpoints

hjchen2 commented 5 months ago

The comfy enterprise v1 model is quantized with deepcache. Since deepcache leads to a decrease in quality, in order to ensure quality, only part of the linear modules are quantized in this model, and the conv modules are not quantized.

~You can try to use comfy's quantization workflow to quantize a sdxl-base model without deepcache yourself. Please refer to https://github.com/siliconflow/onediff/tree/main/onediff_comfy_nodes#quantization~

@doombeaker Can you provide a comfy quantization workflow for sandor?

doombeaker commented 5 months ago

@sandor-lisn

here is the quantization workflow, you can quantize the model by yourself and set the related hype parameters below as you need.

The "Quant K Sampler" is the key node contains almost all the hype parameters.

Parameter name	Description	Type	Default Value
`bits`	Number of bits	`INT`	`8`
`quantize_conv`	Whether to quantize the convolution layer	`STRING`	`enable`
`quantize_linear`	Whether to quantize the linear layer	`STRING`	`enable`
`conv_mse_threshold`	MSE threshold for quantizing vs not quantizing this convolution layer	`FLOAT`	`0.1`
`linear_mse_threshold`	MSE threshold for quantizing vs not quantizing this linear layer	`FLOAT`	`0.1`
`compute_density_threshold`	When the computation density of this layer is below this threshold, do not quantize	`INT`	`0`
`save_filename_prefix`	Prefix of the configuration file name for the saved quantized model	`STRING`	`unet`
`overwrite`	Whether to overwrite the existing quantization configuration file	`STRING`	`enable`
`static_mode`	Use oneflow_compile when calculating output	`STRING`	`enable`

The quantization configuration file will be saved in ComfyUI/models/onediff_quant/ mse_threshold: the smaller the mse, the more similar they are Comfy's new quantization tool takes about 35 minutes to quantize an sdxl. Image: 1024x1024, step: 20, NVIDIA A100-PCIE-40GB.

hjchen2 commented 5 months ago

@sandor-lisn

here is the quantization workflow, you can quantize the model by yourself and set the related hype parameters below as you need.

The "Quant K Sampler" is the key node contains almost all the hype parameters.

Parameter name Description Type Default Value bits Number of bits INT 8 quantize_conv Whether to quantize the convolution layer STRING enable quantize_linear Whether to quantize the linear layer STRING enable conv_mse_threshold MSE threshold for quantizing vs not quantizing this convolution layer FLOAT 0.1 linear_mse_threshold MSE threshold for quantizing vs not quantizing this linear layer FLOAT 0.1 compute_density_threshold When the computation density of this layer is below this threshold, do not quantize INT 0 save_filename_prefix Prefix of the configuration file name for the saved quantized model STRING unet overwrite Whether to overwrite the existing quantization configuration file STRING enable static_mode Use oneflow_compile when calculating output STRING enable The quantization configuration file will be saved in ComfyUI/models/onediff_quant/ mse_threshold: the smaller the mse, the more similar they are Comfy's new quantization tool takes about 35 minutes to quantize an sdxl. Image: 1024x1024, step: 20, NVIDIA A100-PCIE-40GB.

And the DeepCache SpeedUp node should be disable in this workflow. The recommended value of compute_density_threshold is 300, and default values for other parameters.

sandor-lisn commented 5 months ago

Thank you very much for your detailled answers!

I think that I understand a little better now, but I am still confused... Is there any documentation about the interplay of deepcache, quantization, and onediff? It would be very helpful for me to better understand the bigger picture.

A problem for me is that I can't run the workflow that doombeaker posted above. I have big problems running deepcache, see my other open issue: https://github.com/siliconflow/onediff/issues/763

strint commented 5 months ago

too old to follow, please free to reopen it if the problem still exists. @sandor-lisn

siliconflow / onediff