google / qkeras

QKeras: a quantization deep learning library for Tensorflow Keras
Apache License 2.0
533 stars 102 forks source link

Can QKeras support Full integer quantization #111

Open KaiyiZhang-uo opened 1 year ago

KaiyiZhang-uo commented 1 year ago

I know I can get the weight and bias to INT8 by setting: kernel_quantizer=quantized_bits(bits=8, integer=7, alpha=1) bias_quantizer=quantized_bits(bits=8, integer=7, alpha=1)

However, sometimes, the input tensor is still float because we have to normalize the input. So, my question is "Can QKeras support the quantization of input tensor?"

jurevreca12 commented 1 year ago

Just add theqmodel.add(QActivation("quantized_relu(bits=8, integer=7, alpha=1)")) as the first layer, and you will have quantized inputs.

With regards to normalization. You can still normalize it, and then scale it to fit the integer value to which this quantizer is quantizing. Inn this case it is 128.

KaiyiZhang-uo commented 1 year ago

@jurevreca12 Thanks for your reply.

After I normalize my input, how do I scale it to fit the integer value to which this quantizer is quantizing? (in this case, it should be [-128, 128]) Does QKeras has a built-in function to do it? (It looks like quantized_relu() cannot achieve this because 'quantized_relu' object has no attribute 'scale')

or should I scale the normalized input manually?

jurevreca12 commented 1 year ago

You could just scale it manually, as part of the pre-processing. I don't think there is any built in function in qkeras.

KaiyiZhang-uo commented 1 year ago

@jurevreca12 Thanks for your feedback. So, does Qkeras support full integer quantization? For full integer quantization, we have to calibrate or estimate the range, i.e, (min, max) of all floating-point tensors in the model. Unlike constant tensors such as weights and biases, variable tensors such as model input, activations (outputs of intermediate layers) and model output cannot be calibrated unless we run a few inference cycles. In TFlite, we can use representative dataset to to calibrate them. It looks like QKeras does not support the use of representative dataset. Does it mean QKeras does not support full integer quantization? Is there an alternative way to achieve full integer quantization in QKeras?

jurevreca12 commented 1 year ago

QKeras does support full integer quantization. Where the operations are only: integer multiply, integer add, shift and compare (e.g. calculate ReLU). However QKeras is made for quantization aware training, which does provide better results, but can be a little complicated for beginners.

One simple example would be to create a fully-integer network for MNIST. The inputs are ranged between 0-255. So you can set the input quantizer to be quantized_bits(bits=8, integer=0, keep_negative=False). You don't really lose any information with this, since the inputs (if you use the tensorflow datasets version of MNIST) are floats with values 0,1,2,3...255. Actually you don't really even need to have the input quantizer in this case, but I like to add it just that I am sure what my input quantization is later. Next you need to quantized the weights, and bias. For this you can put practically any quantization. And lastly you need to pick an appropriate output function this could be i.e. ReLU. Now when you add an input quantization to the next layer, you will have defined the quantization of that layer. This may be a little confusing, but the way to think of it is you are trying to approximate integer computation with float computation (as qkeras/tf is still doing float computation in the background), so you are limiting the sets of values to a specific set. An alternative way to achieve full integer quantization is brevitas in pytorch (all though it is quite similar to qkeras). Also tf lite, can be used, but there you can only achieve 8-bit quantization.

jurevreca12 commented 1 year ago

One additional confusing thing for beginners are scaling factors. For exploring how qkeras works, I recommend setting alpha=1 for all quantizers. This way you will see which integer values each tensor represents more easily.

KaiyiZhang-uo commented 1 year ago

@jurevreca12 I very much appreciate your feedback, and I have learned so much. Thanks a lot.

The current problem is that I have no idea about getting the scale of the activations tensor. From my understanding, for a fully connected layer, we have the input tensor, the weight tensor, and the output tensor (i.e., activations tensor). To achieve the integer-arithmetic-only matrix multiplication, we need to have the scales for all three tensors. We have discussed how to achieve the quantization of input tensor in the current Issue. It's very easy to get the scale of the weight tensor because the quantized_bits() method has the scale attribute, _dense_weight = quantized_bits(bits=8, integer=7, alpha='auto') print(denseweight.scale)

So, the only problem is to get the scale of the activations tensor. I tried two ways,

  1. normally, I add the activation layers by using _model.add(QActivation("quantizedrelu")) Unlike quantized_bits(), quantized_relu has no attribute scale. I understand this because normally when we need to quantize the activation layer, we need to have the representative data to estimate the activations quantization range. Unfortunately, QKeras does not support the use of the representative dataset.

  2. I tried to use QAdaptiveActivation() as the activation layer, which can estimate the EMA of min and max of the activation values, i.e., the quantization range of the activations tensor. In this way, we don't need the representative dataset anymore. However, QAdaptiveActivation() also has no attribute scale.

    So, is there any way that I can get the scale of the activations tensor?

jurevreca12 commented 1 year ago

The scale of the activation tensor is determined by the scale of the two multiplicands: the weights, and the inputs. There is a nice paper from Google that explains this well: https://arxiv.org/pdf/1712.05877.pdf.

Btw. In the last answer I said the input quantizer should be quantized_bits(bits=8, integer=0, keep_negative=False) for MNIST. This is wrong, it should be quantized_bits(bits=8, integer=8, keep_negative=False).