Closed mason5957 closed 5 months ago
Hi, it depends on the model size and the CPU you run on. It is normal to take up to an hour for initialization, due to the parameter search for channelwise initialization.
@hatchetProject Understood, thank you for your response.
However, I encountered an error while performing the ImageNet calibration, specifically during the block reconstruction phase. Could you please advise on how to resolve this issue? ` 04/08/2024 06:18:02 - INFO - qdiff.layer_recon - Total loss: 0.817 (rec:0.817, round:0.000) b=2.00 count=20000 04/08/2024 06:18:02 - INFO - main - transformer_blocks False 04/08/2024 06:18:02 - INFO - main - 0 True 04/08/2024 06:18:02 - INFO - main - Reconstruction for block 0 cond True 04/08/2024 06:18:02 - INFO - qdiff.block_recon - Saving 10 intermediate results to disk to avoid OOM 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 25/25 [00:01<00:00, 19.95it/s] 04/08/2024 06:18:04 - INFO - qdiff.utils - in 1 shape: torch.Size([200, 1024, 384]), in 2 shape: torch.Size([200, 1, 512]) 04/08/2024 06:18:04 - INFO - qdiff.utils - out shape: torch.Size([200, 1024, 384]) 04/08/2024 06:18:07 - INFO - qdiff.block_recon - Saving 10 intermediate results to disk to avoid OOM ...
Traceback (most recent call last):
File "sample_diffusion_ldm_imagenet.py", line 596, in
Additionally, I noticed that in quant_model.py, setattr is called twice, which seems a bit unusual. Could you please review this?
def quant_block_refactor(self, module, weight_quant_params, act_quant_params, timewise, list_timesteps): for name, child_module in module.named_children(): if type(child_module) in self.specials: if self.specials[type(child_module)] in [QuantBasicTransformerBlock]: setattr(module, name, self.specials[type(child_module)](child_module, act_quant_params, sm_abit=self.sm_abit, timewise=timewise, list_timesteps=list_timesteps)) setattr(module, name, self.specials[type(child_module)](child_module, act_quant_params)) else: self.quant_block_refactor(child_module, weight_quant_params, act_quant_params, timewise, list_timesteps)
Thank you very much!!
Hi, I haven't encountered this issue before. Based on your message, I suspect that this is originated from the "checkpoint" usage in the QuantResBlock() class, and you can check the data type of the input to make sure they are correct (torch type instead of numpy). Just in case if it is a package error, I am using torch version 1.13.1, timm 0.4.12.
Thanks for pointing out the error of "setattr". I have updated the qdiff/quant_model.py file. Please use the new one :)
@hatchetProject Hello, I have encountered a question regarding whether 'err += loss_func(activation[k], activation_fp[k][ihead:(i+1)head].cuda()) ' in post_layer_recon_imagenet.py needs to be uncommented.
P.S. This line is commented in post_layer_recon_uncond.py Thank you very much!!
No, it doesn't need to be uncommented in pd_optimize_timewise(), you can comment the above line 103 as well. Typically including this or not does not influence much. The line is only uncommented for Stable Diffusion to provide finer-grained alignment.
Got it, thanks a lot.
@mason5957 Hi,I also encountered the problem with
AttributeError: 'numpy.int64' object has no attribute 'detach'
you mentioned above. How did you fix this bug?
@cantbebetter2 Hi, I have changed the class CheckpointFunction in ldm/modules/diffusionmodules/util.py aT line 190 into
`class CheckpointFunction(torch.autograd.Function): @staticmethod def forward(ctx, run_function, length, *args): ctx.run_function = run_function ctx.input_tensors = list(args[:length]) ctx.input_params = list(args[length:])
with torch.no_grad():
output_tensors = ctx.run_function(*ctx.input_tensors)
return output_tensors
@staticmethod
def backward(ctx, *output_grads):
# print("ctx.input_tensors: ",ctx.input_tensors)
# print(type(ctx.input_tensors))
# for x in ctx.input_tensors:
# print(x.dtype)
# print(x)
# assume ctx.input_tensors include NumPy arrays and PyTorch Tensors
ctx.input_tensors = [
x if isinstance(x, torch.Tensor)
else torch.tensor(x, dtype=torch.float32).requires_grad_(True) if isinstance(x, (np.ndarray, np.generic))
else x
for x in ctx.input_tensors
]
ctx.input_tensors = [x.detach().requires_grad_(True) for x in ctx.input_tensors]
with torch.enable_grad():
# Fixes a bug where the first op in run_function modifies the
# Tensor storage in place, which is not allowed for detach()'d
# Tensors.
shallow_copies = [x.view_as(x) for x in ctx.input_tensors]
# shallow_copies = [x.view_as(x) if idx != 2 else x.view_as(x).item() for idx, x in enumerate(ctx.input_tensors)]
# print("shallow_copies[0]:", shallow_copies[0])
# print("shallow_copies[1]:", shallow_copies[1])
# print("shallow_copies[2]:", shallow_copies[2])
output_tensors = ctx.run_function(*shallow_copies)
input_grads = torch.autograd.grad(
output_tensors,
ctx.input_tensors + ctx.input_params,
output_grads,
allow_unused=True,
)
del ctx.input_tensors
del ctx.input_params
del output_tensors
return (None, None) + input_grads`
@mason5957 Thanks a lot! I was surprised that you responded so quickly, it definitely solved my problem. By the way, may I ask how much time it takes for one PTQ calibration. I successfully run the script for imagenet without pd_optimize_timeembed and pd_optimize_timewise, and it already takes much longer time than other PTQ methods.
@cantbebetter2 Not at all. For me, the entire process typically takes about 2 days to complete.
@mason5957 @cantbebetter2
The time cost is mainly at the reconstruction stage (for the weights). Typically we just load the weight quantization parameters and then calculate the time cost and performance. It is also valid to skip the reconstruction stage, sometimes with negligible performance degradation.
If you run without pd_optimize_timeembed and pd_optimize_timewise (w8a8 and w4a8), it is just QDiffusion without activation reconstruction, which cannot take more time.
Thank you for your work. However, Is it normal for the model initialization process to take up to an hour?