tensorflow / model-optimization

A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.
https://www.tensorflow.org/model_optimization
Apache License 2.0
1.49k stars 319 forks source link

QAT (quantization aware training) Support quantizing models recursively #377

Open CRosero opened 4 years ago

CRosero commented 4 years ago

Describe the bug I'm doing transfer learning and would like to (at the end) quantize my model. The problem is that when I try to use the _quantizemodel() function (which is used successfully in numerous tutorials and videos), I get an error. How am I supposed to do quantization for transfer learning (using an already previously built model as a feature extractor)?

System information

TensorFlow installed from (source or binary): pip

TensorFlow version: tf-nightly 2.2.0

TensorFlow Model Optimization version: 0.3.0

Python version: 3.7.7

Describe the expected behavior I expect the model to be successfully quantized and for no error messages to appear.

Describe the current behavior I get the error: "ValueError: Quantizing a tf.keras Model inside another tf.keras Model is not supported."

Code to reproduce the issue Can be found here

kmkolasinski commented 4 years ago

My way to workaround this is to quantize both models separately and then combine them into normal Keras model.

q_base_model = quantize_model(base_model)
q_head_model = quantize_model(head_model)
inputs = Input(...)
h = q_base_model(inputs)
outputs = q_head_model(h)
full_model = Model(inputs, outputs)
full_model.compile(...)
full_model.fit(...)

I'm not sure if this is a correct approach but it works for me.

miaout17 commented 4 years ago

@alanchiao @nutsiepully Could you take a look? Thanks!

nutsiepully commented 4 years ago

Hi @CRosero,

We haven't added support for quantizing Keras models within models yet. This is possible, and something we intend to do in the future.

In the meanwhile, @kmkolasinski is right. That's the approach you would have to use when using models recursively. Just quantize all the models you are interested in.

Thanks @kmkolasinski!

Kyle719 commented 4 years ago

Thanks! @nutsiepully @kmkolasinski

Quantizing models recursively and combining models cannot make fully quantized model?

base_model = keras.Sequential([ keras.layers.InputLayer(input_shape=(28, 28)), keras.layers.Reshape(target_shape=(28, 28, 1)), keras.layers.Conv2D(filters=12, kernel_size=(3, 3), activation=tf.nn.relu), keras.layers.MaxPooling2D(pool_size=(2, 2)), keras.layers.Flatten(), ])

head_model = keras.Sequential([ keras.layers.InputLayer(input_shape=(None, 2028)), keras.layers.Dense(10, activation=tf.nn.softmax) ])

quantize_model = tfmot.quantization.keras.quantize_model

q_base_model = quantize_model(base_model) q_head_model = quantize_model(head_model)

q_full_model = keras.Sequential([ q_base_model, q_head_model ])

q_full_model.compile... q_full_model.fit...

converter = tf.lite.TFLiteConverter.from_keras_model(q_full_model) converter.optimizations = [tf.lite.Optimize.DEFAULT] quantized_tflite_model = converter.convert()

When I tried to convert it, I got the error message :

ValueError("Unsupported tf.dtype {0}".format(tf_dtype))

q_full_model is not fully quantized?

nutsiepully commented 4 years ago

Hi @Kyle719,

I tried reproducing this, but I didn't see any errors. It converted just fine.

Please make sure you use tf-nightly. This should explain how the conversion is done.

willbattel commented 4 years ago

Hey @nutsiepully thanks for the insight. Would you mind keeping this issue up to date with any changes in status/priority/roadmaps, etc.. regarding this capability moving forward? Thanks!

nutsiepully commented 4 years ago

Will update it, once we add support for it.

CRosero commented 4 years ago

@kmkolasinski Thanks for your suggestion. I am trying it out but unfortunately not getting it to work. My code looks similar to that of @Kyle719, but I'm already getting a ValueError on q_head_model = quantize_model(head_model), saying

model must contain at least one layer which have been annotated with quantize_annotate*. There are no layers to quantize.

If you go to versions saved, this is labeled as "Initial attempt" Even after adding that which is suggested in the error (version _"quantizeannotate change"), it doesn't go away and still stays there.

@nutsiepully and the others, do you happen to have any suggestions for a solution? (FYI I made the link so you can try the code and corresponding solutions out on the colab directly, hope that makes it easier)

nutsiepully commented 4 years ago

@CRosero - I fixed the code in your colab. Your Sequential model has not been constructed correctly - it was missing parentheses. It does not actually have any layers. That's why it was failing.

Also, after quantize_annotate..., you just have to use quantize_apply not quantize_model again. Though it still works here.

I understand the complexity of using a new API, but it's generally not feasible for me to debug user code.

CRosero commented 4 years ago

Thank you very much @nutsiepully for your patience help! Didn't notice that at all...made the corresponding changes and now it's working :)

Kyle719 commented 4 years ago

Thanks! @nutsiepully 'Transfer learning + QAT' is working well like the code below (I used VGG19 because it does not have the batch normalization layer which is not supported for qat yet)

I have one more question now! How can I follow the steps introduced in the tensorflow page? https://www.tensorflow.org/model_optimization/guide/quantization/training_example

The steps :

  1. Train model (no quantization related)
  2. Fine tune with quantization aware training for just an epoch
  3. Convert it to tflite

Is it possible to fine tune model with qat by quantizing models recursively ?

` base_model = tf.keras.applications.VGG19(input_shape=IMG_SHAPE, include_top=False, weights='imagenet')

head_model = tf.keras.Sequential([ tf.keras.layers.InputLayer(input_shape=(5, 5, 512)), tf.keras.layers.GlobalAveragePooling2D(), tf.keras.layers.Dense(1) ])

import tensorflow_model_optimization as tfmot quantize_model = tfmot.quantization.keras.quantize_model

q_base_model = quantize_model(base_model) q_head_model = quantize_model(head_model)

original_inputs = tf.keras.Input(IMG_SHAPE) output1 = q_base_model(original_inputs) output2 = q_head_model(output1)

q_aware_model = tf.keras.Model(inputs=original_inputs, outputs=output2)

base_learning_rate = 0.0001 q_aware_model.compile(optimizer=tf.keras.optimizers.RMSprop(lr=base_learning_rate), loss=tf.keras.losses.BinaryCrossentropy(from_logits=True), metrics=['accuracy'])

initial_epochs = 1 validation_steps=20

history = q_aware_model.fit(train_batches, epochs=initial_epochs )

converter = tf.lite.TFLiteConverter.from_keras_model(q_aware_model) converter.optimizations = [tf.lite.Optimize.DEFAULT] quantized_tflitemodel = converter.convert() , quant_file = tempfile.mkstemp('.tflite') with open(quant_file, 'wb') as f: f.write(quantized_tflite_model) print("Quantized model in Mb:", os.path.getsize(quant_file) / float(2**20)) `

teijeong commented 3 years ago

Hi @Xhark, can you comment if nested model is supported now?

Xhark commented 3 years ago

We don't support fully recursively, but now you can apply quantize the model contains sub-model.

e.g) q_base_model = quantize_model(base_model)

original_inputs = tf.keras.Input(IMG_SHAPE) x = q_base_model(original_inputs) x = tf.keras.layers.GlobalAveragePooling2D()(x) output=tf.keras.layers.Dense(1)(x)

model = tf.keras.Model(inputs=original_inputs, outputs=output)

q_aware_model = quantize_model(q_base_model)

--

This example was not supported before, but it works now.

nutsiepully commented 3 years ago

Thanks @Xhark.

Seems to me the last line q_aware_model = quantize_model(q_base_model) is not needed. q_base_model is already quantized, right?

Xhark commented 2 years ago

q_base_model is already quantized, but last line is needed to quantize outside of the q_base_model. (GAP and Dense)

aqibsaeed commented 2 years ago

This works for me, may be it is useful for someone!

def create_quantization_model(model):
  layers = []
  for i in range(len(model.layers)):
    if isinstance(model.layers[i], tf.keras.models.Model):
      quant_sub_model = tf.keras.models.clone_model(model.layers[i], clone_function= apply_quantization)
      layers.append(tfmot.quantization.keras.quantize_apply(quant_sub_model))
    else:
      layers.append(apply_quantization(model.layers[i]))
  quant_model = tf.keras.models.Sequential(layers)
  return quant_model
def apply_quantization(layer):
  if isinstance(layer, tf.keras.layers.Dense):
    return tfmot.quantization.keras.quantize_annotate_layer(layer)
  return layer
frytoli commented 2 years ago

Any tips on quantizing the Pix2Pix generator? I've used this official tutorial as a guide, and have attempted the following to no avail:

def downsample(filters, size, apply_batchnorm=True):
  initializer = tf.random_normal_initializer(0., 0.02)

  result = tf.keras.Sequential()
  result.add(
      tfmot.quantization.keras.quantize_annotate_layer(
        tf.keras.layers.Conv2D(filters, size, strides=2, padding='same',
                             kernel_initializer=initializer, use_bias=False)
      ) 
    )

  if apply_batchnorm:
    result.add(
      tfmot.quantization.keras.quantize_annotate_layer(
        tf.keras.layers.BatchNormalization()
      )
    )

  result.add(
      tfmot.quantization.keras.quantize_annotate_layer(
        tf.keras.layers.LeakyReLU()
      )
  )

  return result
  def upsample(filters, size, apply_dropout=False):
  initializer = tf.random_normal_initializer(0., 0.02)

  result = tf.keras.Sequential()
  result.add(
    tfmot.quantization.keras.quantize_annotate_layer(
      tf.keras.layers.Conv2DTranspose(filters, size, strides=2,
                                      padding='same',
                                      kernel_initializer=initializer,
                                      use_bias=False)
      )
  )

  result.add(
      tfmot.quantization.keras.quantize_annotate_layer(
          tf.keras.layers.BatchNormalization()
      )
  )

  if apply_dropout:
      result.add(
        tfmot.quantization.keras.quantize_annotate_layer(
          tf.keras.layers.Dropout(0.5)
        )
      )

  result.add(
    tfmot.quantization.keras.quantize_annotate_layer(
      tf.keras.layers.ReLU()
    )
  )

  return result
def Generator():
  inputs = tf.keras.layers.Input(shape=[512, 512, 3]) # Old: 256

  down_stack = [
    downsample(128, 4, apply_batchnorm=False),  # (batch_size, 128, 128, 64)
    downsample(256, 4),  # (batch_size, 64, 64, 128)
    downsample(512, 4),  # (batch_size, 32, 32, 256)
    downsample(1024, 4),  # (batch_size, 16, 16, 512)
    downsample(1024, 4),  # (batch_size, 8, 8, 512)
    downsample(1024, 4),  # (batch_size, 4, 4, 512)
    downsample(1024, 4),  # (batch_size, 2, 2, 512)
    downsample(1024, 4),  # (batch_size, 1, 1, 512)
  ]

  up_stack = [
    upsample(1024, 4, apply_dropout=True),  # (batch_size, 2, 2, 1024)
    upsample(1024, 4, apply_dropout=True),  # (batch_size, 4, 4, 1024)
    upsample(1024, 4, apply_dropout=True),  # (batch_size, 8, 8, 1024)
    upsample(1024, 4),  # (batch_size, 16, 16, 1024)
    upsample(512, 4),  # (batch_size, 32, 32, 512)
    upsample(256, 4),  # (batch_size, 64, 64, 256)
    upsample(128, 4),  # (batch_size, 128, 128, 128)
  ]

  initializer = tf.random_normal_initializer(0., 0.02)
  last = tf.keras.layers.Conv2DTranspose(OUTPUT_CHANNELS, 4,
                                         strides=2,
                                         padding='same',
                                         kernel_initializer=initializer,
                                         activation='tanh')  # (batch_size, 256, 256, 3)

  x = inputs

  # Downsampling through the model
  skips = []
  for down in down_stack:
    x = down(x)
    skips.append(x)

  skips = reversed(skips[:-1])

  # Upsampling and establishing the skip connections
  for up, skip in zip(up_stack, skips):
    x = up(x)
    x = tf.keras.layers.Concatenate()([x, skip])

  x = last(x)

  # Model
  model = tf.keras.Model(inputs=inputs, outputs=x)

  # Quantize
  q_model = tfmot.quantization.keras.quantize_apply(model)

  return q_model

This current setup gives me the error "_ValueError: model must contain at least one layer which have been annotated with quantizeannotate*. There are no layers to quantize." Then when I quantize_apply on the Sequential models in the up/downsample functions the error changes to "_ValueError: model must be a built model. been built yet. Please call model.build(inputshape) before quantizing your model" (which makes sense). Is it possible to quantize with this model structure? Thanks in advance!

aqibsaeed commented 2 years ago

Can you try creating your model without any quantization first? Then call: q_model = tf.keras.models.clone_model(model, clone_function= apply_quantization), where apply_quantization should annotate every layer you want quantize with tfmot.quantization.keras.quantize_annotate_layer.

frytoli commented 2 years ago

Thanks for the quick response! That doesn't throw an error, but it doesn't look like it quantizes the layers created within the upsample and downsample functions. Is there any way to also get those layers?

Model: "model_2"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
==================================================================================================
 input_3 (InputLayer)           [(None, 512, 512, 3  0           []                               
                                )]                                                                

 sequential_34 (Sequential)     (None, 256, 256, 12  6144        ['input_3[0][0]']                
                                8)                                                                

 sequential_35 (Sequential)     (None, 128, 128, 25  525312      ['sequential_34[1][0]']          
                                6)                                                                

 sequential_36 (Sequential)     (None, 64, 64, 512)  2099200     ['sequential_35[1][0]']          

 sequential_37 (Sequential)     (None, 32, 32, 1024  8392704     ['sequential_36[1][0]']          
                                )                                                                 

 sequential_38 (Sequential)     (None, 16, 16, 1024  16781312    ['sequential_37[1][0]']          
                                )                                                                 

 sequential_39 (Sequential)     (None, 8, 8, 1024)   16781312    ['sequential_38[1][0]']          

 sequential_40 (Sequential)     (None, 4, 4, 1024)   16781312    ['sequential_39[1][0]']          

 sequential_41 (Sequential)     (None, 2, 2, 1024)   16781312    ['sequential_40[1][0]']          

 sequential_42 (Sequential)     (None, 4, 4, 1024)   16781312    ['sequential_41[1][0]']          

 concatenate_14 (Concatenate)   (None, 4, 4, 2048)   0           ['sequential_42[1][0]',          
                                                                  'sequential_40[1][0]']          

 sequential_43 (Sequential)     (None, 8, 8, 1024)   33558528    ['concatenate_14[1][0]']         

 concatenate_15 (Concatenate)   (None, 8, 8, 2048)   0           ['sequential_43[1][0]',          
                                                                  'sequential_39[1][0]']          

 sequential_44 (Sequential)     (None, 16, 16, 1024  33558528    ['concatenate_15[1][0]']         
                                )                                                                 

 concatenate_16 (Concatenate)   (None, 16, 16, 2048  0           ['sequential_44[1][0]',          
                                )                                 'sequential_38[1][0]']          

 sequential_45 (Sequential)     (None, 32, 32, 1024  33558528    ['concatenate_16[1][0]']         
                                )                                                                 

 concatenate_17 (Concatenate)   (None, 32, 32, 2048  0           ['sequential_45[1][0]',          
                                )                                 'sequential_37[1][0]']          

 sequential_46 (Sequential)     (None, 64, 64, 512)  16779264    ['concatenate_17[1][0]']         

 concatenate_18 (Concatenate)   (None, 64, 64, 1024  0           ['sequential_46[1][0]',          
                                )                                 'sequential_36[1][0]']          

 sequential_47 (Sequential)     (None, 128, 128, 25  4195328     ['concatenate_18[1][0]']         
                                6)                                                                

 concatenate_19 (Concatenate)   (None, 128, 128, 51  0           ['sequential_47[1][0]',          
                                2)                                'sequential_35[1][0]']          

 sequential_48 (Sequential)     (None, 256, 256, 12  1049088     ['concatenate_19[1][0]']         
                                8)                                                                

 concatenate_20 (Concatenate)   (None, 256, 256, 25  0           ['sequential_48[1][0]',          
                                6)                                'sequential_34[1][0]']          

 quantize_annotate_28 (Quantize  (None, 512, 512, 3)  12291      ['concatenate_20[1][0]']         
 Annotate)                                                                                        

==================================================================================================
Total params: 217,641,475
Trainable params: 217,619,715
Non-trainable params: 21,760
__________________________________________________________________________________________________
aqibsaeed commented 2 years ago

I think quantization does not really go recursively for models that contains other models (in your case main model contains other sequential models). Did you try passing your model to create_quantization_model(model) function mentioned here? I think the solution would be to iterate over model layers, if you encounter sequential model then iterate over its layers too , to annotate it.

frytoli commented 2 years ago

I did manage to get that working for me with some additional layers in the _applyquantization function (I'm still learning here!). But, I receive the following error, I think due to the Concatenation layers between the sub-models:

ValueError: A merge layer should be called on a list of inputs. Received: inputs=Tensor("Placeholder:0", shape=(None, 4, 4, 1024), dtype=float32) (not a list of tensors)

I've also tested changing quant_model = tf.keras.Sequential(layers) to quant_model = tf.keras.Model(layers) in _applyquantization and it runs without issue. However, then when I call and attempt to view the new quantized model's summary like this q_model(inputs=inputs), I receive this error:

Unimplemented `tf.keras.Model.call()`: if you intend to create a `Model` with the Functional API, please provide `inputs` and `outputs` arguments. Otherwise, subclass `Model` with an overridden `call()` method.

Thanks again for your help.

aqibsaeed commented 2 years ago

Great. Check this to implement call: https://keras.io/guides/customizing_what_happens_in_fit/ There is GAN example at the end of the page, that would be useful.

frytoli commented 2 years ago

Wonderful! Thanks again!

ashwinv99 commented 2 years ago

additional

HI @frytoli, what changes did you make in apply quantisation function to apply quantisation to submodules (conv layers) of upsample and downsample blocks too? I am also facing a similar issue in my problem