Closed brisker closed 6 months ago
Hi, thanks for the questions,
In contrast to weight quantization, there are no underlying parameters that need to be learned for the activation which will result in the oscillation vicious cycle between the update of the learnable scaling factor and the corresponding underlying weights, thus we still adopt LSQ for quantizing activation.
Nice work in the paper. Besides: 1) Is there any analysis on the oscillation problem on the activation quantization? Since activation 2bit quantization is harder than weight quantization a lot, it is natural that there is also oscillation problem in the activation quantization. 2) It is a little confusing to me that, why is weight quantization optimization enough to achieve the good performance in the paper, without any mention of activation quantization?