I apologize for any inconvenience, and I have a few questions regarding the code snippet:
Why did you choose to calculate each sample individually using the for i in range(activation_fp["input"][0].shape[0]) loop instead of computing the entire batch at once?
Could you please clarify the purpose of calculating head? In my testing so far, head seems to always be 1 (currently, I've only tested it with ImageNet, and not yet with other models).
I noticed that err += loss_func(output_quant, activation_fp["output"][i].unsqueeze(0).cuda()) is repeated k times for k in activation.keys():. Is there any specific reason for this repetition, or is there another significance to it?
I would greatly appreciate it if you could take some time to respond to these queries. Thank you very much for your assistance.
Sorry about the confusion in this part, regarding each question:
This is for limited GPU memory. If your memory is large enough, you can choose a larger batch size.
The head is not 1 when applied to aligning other activations (e.g. q@k, v). But currently we do not align those, so yes, it is 1 and you can ignore its usage.
This is to align the loss scale with the line above (to avoid it being too large). You can definitely calculate it independently and multiply with len(activation.keys()). We didn't tune the hyper-parameters closely, so feel free to change them.
I apologize for any inconvenience, and I have a few questions regarding the code snippet:
Why did you choose to calculate each sample individually using the
for i in range(activation_fp["input"][0].shape[0])
loop instead of computing the entire batch at once?Could you please clarify the purpose of calculating head? In my testing so far, head seems to always be 1 (currently, I've only tested it with ImageNet, and not yet with other models).
I noticed that err += loss_func(output_quant, activation_fp["output"][i].unsqueeze(0).cuda()) is repeated k times
for k in activation.keys():
. Is there any specific reason for this repetition, or is there another significance to it?I would greatly appreciate it if you could take some time to respond to these queries. Thank you very much for your assistance.