hatchetProject / QuEST

QuEST: Efficient Finetuning for Low-bit Diffusion Models
26 stars 2 forks source link

Some questions about the code #9

Closed mason5957 closed 1 month ago

mason5957 commented 1 month ago

I apologize for any inconvenience, and I have a few questions regarding the code snippet:

image

  1. Why did you choose to calculate each sample individually using the for i in range(activation_fp["input"][0].shape[0]) loop instead of computing the entire batch at once?

  2. Could you please clarify the purpose of calculating head? In my testing so far, head seems to always be 1 (currently, I've only tested it with ImageNet, and not yet with other models).

  3. I noticed that err += loss_func(output_quant, activation_fp["output"][i].unsqueeze(0).cuda()) is repeated k times for k in activation.keys():. Is there any specific reason for this repetition, or is there another significance to it?

I would greatly appreciate it if you could take some time to respond to these queries. Thank you very much for your assistance.

hatchetProject commented 1 month ago

Sorry about the confusion in this part, regarding each question:

  1. This is for limited GPU memory. If your memory is large enough, you can choose a larger batch size.
  2. The head is not 1 when applied to aligning other activations (e.g. q@k, v). But currently we do not align those, so yes, it is 1 and you can ignore its usage.
  3. This is to align the loss scale with the line above (to avoid it being too large). You can definitely calculate it independently and multiply with len(activation.keys()). We didn't tune the hyper-parameters closely, so feel free to change them.