Open lovodkin93 opened 3 years ago
QAT model is only for training. So the weights for QAT model is a float form. For 8 bit quantization, we do actual quantization during TFLite conversion. (inference also just simulate quantized model, but it's not actual quantized. we use float32 ops with fake-quant to simulate.)
You can get fake quantized weights manually, but it still dequantized float32. (from fake-quant)
unquantized_weight, quantizer, quantizer_vars = q_aware_model.layers[2]._weight_vars[0] print(quantizer(unquantized_weight, training=False, weights=quantizer_vars))
@Xhark Ok, I think I understand. So I have a followup question - is the validation done using the quantized version of the model (with the fake-quant weights), or is it done with the unquantized weights?
@Xhark
So I tried what you suggested, and it appears the quantizer
parameters (the second one in q_aware_model.layers[2]._weight_vars[0]) is None. This is weird, given the model is wrapped in the Quantizer Wrapper, and it does have min_var and max_var values.
Do you happen to know what might cause the quantizer
parameter to be None?
@Xhark So after checking, it appears that when trying to access the quantizer after reloading the model's parameters, namely:
checkpoint_path="/home/taaviv/dl-quantization/post_train_quant/best_checkpoint/cp.ckpt"
model = tf.keras.models.load_model(checkpoint_path)
unquantized_weight, quantizer, quantizer_vars = model.layers[2]._weight_vars[0]
the quantizer is None
, in contrast to accessing the trained model prior to saving it (namely, not going through the save and reload phases, but accessing the _weights_vars[0]
paramater immediately after training), namely, when doing:
model = resnet50()
model .compile(...)
model.fit(...)
unquantized_weight, quantizer, quantizer_vars = model.layers[2]._weight_vars[0]
quantizer seems to be of type Quantizer, and not None
.
Do you happen to know what might cause this behaviour?
Hello, I have tried to make my model comaptible with QAT, according to your guideline. I started with defining a QuantizeConfig class:
Then I applied the quantization:
Finally, I chose one of the Conv2D layers, which evidently was quantized:
and looked at its weights, and it appears they have not been quantized to a 4bit encoding, like I specified in the QuantizeConfig:
Do I have an error? Or is there another way to check whether the layer was quantized? Thanks!