tensorflow / model-optimization

A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.
https://www.tensorflow.org/model_optimization
Apache License 2.0
1.49k stars 320 forks source link

quantization not happening? #829

Open lovodkin93 opened 3 years ago

lovodkin93 commented 3 years ago

Hello, I have tried to make my model comaptible with QAT, according to your guideline. I started with defining a QuantizeConfig class:

LastValueQuantizer = tfmot.quantization.keras.quantizers.LastValueQuantizer
MovingAverageQuantizer = tfmot.quantization.keras.quantizers.MovingAverageQuantizer

class DefaultConv2DQuantizeConfig(tfmot.quantization.keras.QuantizeConfig):
    # Configure how to quantize weights.
    def get_weights_and_quantizers(self, layer):
      return [(layer.kernel, LastValueQuantizer(num_bits=4, symmetric=True, narrow_range=False, per_axis=False))]

    # Skip quantizing activations.
    def get_activations_and_quantizers(self, layer):
      return []

    def set_quantize_weights(self, layer, quantize_weights):
      # Add this line for each item returned in `get_weights_and_quantizers`
      # , in the same order
      layer.kernel = quantize_weights[0]

    def set_quantize_activations(self, layer, quantize_activations):
      # Empty since `get_activaations_and_quantizers` returns
      # an empty list.
      return

    # Configure how to quantize outputs (may be equivalent to activations).
    def get_output_quantizers(self, layer):
      return [MovingAverageQuantizer(num_bits=4, symmetric=False, narrow_range=False, per_axis=False)]

    def get_config(self):
      return {}

Then I applied the quantization:

def apply_mix_precision_QAT2(layer):
  # if isinstance(layer, tf.keras.layers.Dense):
  if isinstance(layer, tf.keras.layers.Conv2D):
    return tfmot.quantization.keras.quantize_annotate_layer(layer, quantize_config=DefaultConv2DQuantizeConfig())
  return layer

annotated_model = tf.keras.models.clone_model(model,clone_function=apply_mix_precision_QAT2)
with tfmot.quantization.keras.quantize_scope({'DefaultConv2DQuantizeConfig': DefaultConv2DQuantizeConfig}):
  model = tfmot.quantization.keras.quantize_apply(annotated_model)

Finally, I chose one of the Conv2D layers, which evidently was quantized: image

and looked at its weights, and it appears they have not been quantized to a 4bit encoding, like I specified in the QuantizeConfig: image

Do I have an error? Or is there another way to check whether the layer was quantized? Thanks!

Xhark commented 3 years ago

QAT model is only for training. So the weights for QAT model is a float form. For 8 bit quantization, we do actual quantization during TFLite conversion. (inference also just simulate quantized model, but it's not actual quantized. we use float32 ops with fake-quant to simulate.)

You can get fake quantized weights manually, but it still dequantized float32. (from fake-quant)

unquantized_weight, quantizer, quantizer_vars = q_aware_model.layers[2]._weight_vars[0] print(quantizer(unquantized_weight, training=False, weights=quantizer_vars))

lovodkin93 commented 3 years ago

@Xhark Ok, I think I understand. So I have a followup question - is the validation done using the quantized version of the model (with the fake-quant weights), or is it done with the unquantized weights?

lovodkin93 commented 3 years ago

@Xhark So I tried what you suggested, and it appears the quantizer parameters (the second one in q_aware_model.layers[2]._weight_vars[0]) is None. This is weird, given the model is wrapped in the Quantizer Wrapper, and it does have min_var and max_var values. Do you happen to know what might cause the quantizer parameter to be None?

lovodkin93 commented 3 years ago

@Xhark So after checking, it appears that when trying to access the quantizer after reloading the model's parameters, namely:

checkpoint_path="/home/taaviv/dl-quantization/post_train_quant/best_checkpoint/cp.ckpt"
model = tf.keras.models.load_model(checkpoint_path)
unquantized_weight, quantizer, quantizer_vars = model.layers[2]._weight_vars[0]

the quantizer is None, in contrast to accessing the trained model prior to saving it (namely, not going through the save and reload phases, but accessing the _weights_vars[0] paramater immediately after training), namely, when doing:

model = resnet50()
model .compile(...)
model.fit(...)
unquantized_weight, quantizer, quantizer_vars = model.layers[2]._weight_vars[0]

quantizer seems to be of type Quantizer, and not None. Do you happen to know what might cause this behaviour?