Closed Yheechou closed 20 hours ago
Sure, we will release it as soon as possible. Currently our server is down due to some reason, and may require a few weeks to respawn :( I will notice you as soon as we can access the servers. Sorry for the inconvenience.
Thanks !
I want to get the w4a4 model myself, and the command as below: $ python txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms --cond --ptq --weight_bit <4 or 8> --quant_mode qdiff --quant_act --act_bit 4 --cali_st 25 --cali_batch_size 8 --cali_n 128 --no_grad_ckpt --split --running_stat --sm_abit 16 --cali_data_path ./cali_data_path/sd_coco-s75_sample1024_allst.pt --outdir ./output
How much VRAM do you need? and how much time you cost to calibrate w4a4?
On my A100, it was out of memoy (> 80G) :(
May I know which step it went OOM? It should have worked even with 48GB memory. For the time cost, it may cost nearly one to two days (full quantization) depending on the CPU.
I previously encountered an OOM issue in the (act recon_model) part, but it might have been due to my own mistake.
I am now running full quantization of SD 1.4 calibration without modifying any code. After two hours, it reached (weight recon_model) and the memory usage is 29GB. Is this a normal value?
Yes, it is a normal value :)
Thanks a lot for your reply :)
After one day calibration, it is still calibrating weight (weight recon_model), and the memory usage is 46G.
I wonder if the memory would be large when calibrating activation recon_model
Yes, it will consistently grow during weight calibration... In our experiments, it will get close to 48GB but do not go OOM. Under our case, we do not do activation reconstruction but directly finetune. If you would like to try activation reconstruction, the memory usage also won't surpass 48GB and will be safe under 80GB.
I got it. So your w4a4 model was not from activation reconstruction? How can it be finetune
We do not do activation reconstruction (recon_model) but still initialize it so that we have the basic quantization parameters. You can refer to line 487~575 in txt2img.py for more details.
I see. Thanks a lot :)
We calibrate the model until line 570, and the error occurs:
TypeError: pd_optimize_timeembed() got an unexpected keyword argument 'cond'
and I check the code, the function doesn't have the keyword 'cond', so how can it be solved? Do we need to train from very begin?
Sorry for this bug, I have fixed the code but deleting the "cond" parameter in txt2img.py.
In fact, you do not need to start from the beginning. Currently you have finished the weight reconstruction and activation initialization, and the model should be saved in a folder. You can now directly load the model checkpoint by adding these two parameters to the command line: "--resume --cali_ckpt MODEL_PATH", and it will load the checkpoint and continue at line 573.
Thanks for fixing, and the other error occurs when add "--resume --cali_ckpt MODEL_PATH"
it happens on the utils.resume_cali_model(), it doing well until line 433, happens AssertionError below
"File "/home/yizhou/QuEST/qdiff/quant_block.py", line 325, in _forward assert(len(x) == 2)"
I think maybe the cond line flipped?
Yep, just switch line 433 and 435.
After the line 575 pd_optimize_timeembed(), it occurs error in pd_optimize_timewise()
" File "/home/yizhou/QuEST/qdiff/post_layer_recon_sd.py", line 99, in pd_optimize_timewise err.backward() File "/home/yizhou/anaconda3/envs/ldm/lib/python3.8/site-packages/torch/_tensor.py", line 363, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/home/yizhou/anaconda3/envs/ldm/lib/python3.8/site-packages/torch/autograd/init.py", line 173, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass File "/home/yizhou/anaconda3/envs/ldm/lib/python3.8/site-packages/torch/autograd/function.py", line 253, in apply return user_fn(self, *args) File "/home/yizhou/QuEST/ldm/modules/diffusionmodules/util.py", line 139, in backward input_grads = torch.autograd.grad( File "/home/yizhou/anaconda3/envs/ldm/lib/python3.8/site-packages/torch/autograd/init.py", line 275, in grad return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: One of the differentiated Tensors does not require grad"
Thanks for your solution! And I also want to know whether it should do pd_optimize_timeembed() and pd_optimize_timewise() each time?
If I remember correctly, some effects of torch.no_grad() might have occurred, so you may need to add:
torch.enable_grad():
to the code block (e.g. between line 85 and line 86).
You can do pd_optimize_timeembed() and pd_optimize_timewise() separately to see each's effect (by commenting a line and reloading the model). You can also do them together :)
I wonder why should save calibrated quantized UNet model each time between line 579 - line 600, for inference, I would use --ptq --resume, so each time when I inference, these codes would do again
Another error occurs after the pd_optimize_timeembed(), in txt2img.py line 651 sampler, “ File "/home/yizhou/QuEST/qdiff/quant_layer.py", line 543, in set_timestep self.act_quantizer.current_delta = self.act_quantizer.quantizer_dict[t].delta KeyError: 961”
You can comment the model saving lines for inference, it is flexible.
What is the whole error message? Something that might encounter sampling error can be found in "ldm/models/diffusion/plms.py" line 419~427. If you are using 25 timesteps for calibration, then uncomment the first few lines and comment line 427 (I have also updated the repo).
That is ok after I updated the plms code, I am using 25 timesteps.
The other question is if it is necessary to do "pd_optimize_timeembed()" each time when inference.
And my result for w4a4 was below:
Is that normal?
No, after pd_optimize_timeembed() and pd_optimize_timewise() you can save the corresponding model. And during inference they no longer need to be called.
Yes, the results are normal.
OK, Thanks for lots of question! Hope you best in your reseatch, you're very nice.
Could you please provide the w4a4 stable diffusion model.ckpt?
Thanks a lot!