yhhhli / BRECQ

Pytorch implementation of BRECQ, ICLR 2021
MIT License
250 stars 55 forks source link

Some questions about implementation details #2

Closed AndreevP closed 3 years ago

AndreevP commented 3 years ago

Hello, thank you for an interesting paper and nice code.

I have two questions concerning implementation details.

  1. Does the "one-by-one" block reconstruction mentioned in the paper mean that input to each block comes from already quantized preceding blocks, i.e. each block may correct quantization errors coming from previous blocks? Or maybe input to each block is collected from the full-precision model?
  2. Am I correct in my understanding that in block-wise reconstruction objective you use gradients for each object in calibration sample independently (i.e. no gradient averaging or smth, like in Adam mentioned on the paper)? Besides, what is happening here in data_utils.py, why do you add 1.0 to the gradients?
    cached_grads = cached_grads.abs() + 1.0
    # scaling to make sure its mean is 1
    # cached_grads = cached_grads * torch.sqrt(cached_grads.numel() / cached_grads.pow(2).sum())

Thank you for your time and consideration!

nbasyl commented 1 year ago

Hi, I also found point 2 confusing, have you figured out the rationale behind it?