hatchetProject / QuEST

QuEST: Efficient Finetuning for Low-bit Diffusion Models
26 stars 2 forks source link

About w4a4 calibration model #14

Closed Yheechou closed 20 hours ago

Yheechou commented 6 days ago

Could you please provide the w4a4 stable diffusion model.ckpt?

Thanks a lot!

hatchetProject commented 6 days ago

Sure, we will release it as soon as possible. Currently our server is down due to some reason, and may require a few weeks to respawn :( I will notice you as soon as we can access the servers. Sorry for the inconvenience.

Yheechou commented 4 days ago

Thanks !

I want to get the w4a4 model myself, and the command as below: $ python txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms --cond --ptq --weight_bit <4 or 8> --quant_mode qdiff --quant_act --act_bit 4 --cali_st 25 --cali_batch_size 8 --cali_n 128 --no_grad_ckpt --split --running_stat --sm_abit 16 --cali_data_path ./cali_data_path/sd_coco-s75_sample1024_allst.pt --outdir ./output

How much VRAM do you need? and how much time you cost to calibrate w4a4?

On my A100, it was out of memoy (> 80G) :(

hatchetProject commented 3 days ago

May I know which step it went OOM? It should have worked even with 48GB memory. For the time cost, it may cost nearly one to two days (full quantization) depending on the CPU.

Yheechou commented 3 days ago

I previously encountered an OOM issue in the (act recon_model) part, but it might have been due to my own mistake.

I am now running full quantization of SD 1.4 calibration without modifying any code. After two hours, it reached (weight recon_model) and the memory usage is 29GB. Is this a normal value?

hatchetProject commented 3 days ago

Yes, it is a normal value :)

Yheechou commented 2 days ago

Thanks a lot for your reply :)

After one day calibration, it is still calibrating weight (weight recon_model), and the memory usage is 46G.

I wonder if the memory would be large when calibrating activation recon_model

hatchetProject commented 2 days ago

Yes, it will consistently grow during weight calibration... In our experiments, it will get close to 48GB but do not go OOM. Under our case, we do not do activation reconstruction but directly finetune. If you would like to try activation reconstruction, the memory usage also won't surpass 48GB and will be safe under 80GB.

Yheechou commented 2 days ago

I got it. So your w4a4 model was not from activation reconstruction? How can it be finetune

hatchetProject commented 2 days ago

We do not do activation reconstruction (recon_model) but still initialize it so that we have the basic quantization parameters. You can refer to line 487~575 in txt2img.py for more details.

Yheechou commented 2 days ago

I see. Thanks a lot :)

Yheechou commented 1 day ago

We calibrate the model until line 570, and the error occurs:

TypeError: pd_optimize_timeembed() got an unexpected keyword argument 'cond'

and I check the code, the function doesn't have the keyword 'cond', so how can it be solved? Do we need to train from very begin?

hatchetProject commented 1 day ago

Sorry for this bug, I have fixed the code but deleting the "cond" parameter in txt2img.py.

In fact, you do not need to start from the beginning. Currently you have finished the weight reconstruction and activation initialization, and the model should be saved in a folder. You can now directly load the model checkpoint by adding these two parameters to the command line: "--resume --cali_ckpt MODEL_PATH", and it will load the checkpoint and continue at line 573.

Yheechou commented 1 day ago

Thanks for fixing, and the other error occurs when add "--resume --cali_ckpt MODEL_PATH"

it happens on the utils.resume_cali_model(), it doing well until line 433, happens AssertionError below

"File "/home/yizhou/QuEST/qdiff/quant_block.py", line 325, in _forward assert(len(x) == 2)"

I think maybe the cond line flipped?

hatchetProject commented 1 day ago

Yep, just switch line 433 and 435.

Yheechou commented 1 day ago

After the line 575 pd_optimize_timeembed(), it occurs error in pd_optimize_timewise()

" File "/home/yizhou/QuEST/qdiff/post_layer_recon_sd.py", line 99, in pd_optimize_timewise err.backward() File "/home/yizhou/anaconda3/envs/ldm/lib/python3.8/site-packages/torch/_tensor.py", line 363, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/home/yizhou/anaconda3/envs/ldm/lib/python3.8/site-packages/torch/autograd/init.py", line 173, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass File "/home/yizhou/anaconda3/envs/ldm/lib/python3.8/site-packages/torch/autograd/function.py", line 253, in apply return user_fn(self, *args) File "/home/yizhou/QuEST/ldm/modules/diffusionmodules/util.py", line 139, in backward input_grads = torch.autograd.grad( File "/home/yizhou/anaconda3/envs/ldm/lib/python3.8/site-packages/torch/autograd/init.py", line 275, in grad return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: One of the differentiated Tensors does not require grad"

Thanks for your solution! And I also want to know whether it should do pd_optimize_timeembed() and pd_optimize_timewise() each time?

hatchetProject commented 1 day ago

If I remember correctly, some effects of torch.no_grad() might have occurred, so you may need to add: torch.enable_grad(): to the code block (e.g. between line 85 and line 86).

You can do pd_optimize_timeembed() and pd_optimize_timewise() separately to see each's effect (by commenting a line and reloading the model). You can also do them together :)

Yheechou commented 1 day ago

I wonder why should save calibrated quantized UNet model each time between line 579 - line 600, for inference, I would use --ptq --resume, so each time when I inference, these codes would do again

Another error occurs after the pd_optimize_timeembed(), in txt2img.py line 651 sampler, “ File "/home/yizhou/QuEST/qdiff/quant_layer.py", line 543, in set_timestep self.act_quantizer.current_delta = self.act_quantizer.quantizer_dict[t].delta KeyError: 961”

hatchetProject commented 1 day ago

You can comment the model saving lines for inference, it is flexible.

What is the whole error message? Something that might encounter sampling error can be found in "ldm/models/diffusion/plms.py" line 419~427. If you are using 25 timesteps for calibration, then uncomment the first few lines and comment line 427 (I have also updated the repo).

Yheechou commented 23 hours ago

That is ok after I updated the plms code, I am using 25 timesteps.

The other question is if it is necessary to do "pd_optimize_timeembed()" each time when inference.

And my result for w4a4 was below:

grid-0001

Is that normal?

hatchetProject commented 23 hours ago

No, after pd_optimize_timeembed() and pd_optimize_timewise() you can save the corresponding model. And during inference they no longer need to be called.

Yes, the results are normal.

Yheechou commented 22 hours ago

OK, Thanks for lots of question! Hope you best in your reseatch, you're very nice.