Closed kalaluthien closed 4 years ago
Can you check MobilenetV2 quantization-aware training followed by post-training integer quantation too? I had failed to reproduce results in official guide and overview using tf2.2.0 and tfmot 0.3.0. (Are they measured using TF1.x?)
It would be great if you give experiment settings to reproduce speedup & accuracy comparison between Float & Quantized MobileNetV2! (Or, the experiments are not available with current TF2.x?)
Notebook: https://gist.github.com/kalaluthien/c44da9bb6d027fbca95a144e07179667#file-mobilenetv2_cifar10-ipynb
Summary:
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.experimental_new_converter = True
converter.experimental_new_quantizer = True
dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train)).batch(1)
representative_dataset_gen = get_representative_dataset(dataset, num_calibration_steps=100)
converter.representative_dataset = tf.lite.RepresentativeDataset(representative_dataset_gen)
quantized_tflite_model = converter.convert()
...
interpreter.invoke() # RuntimeError: Quantization not yet supported for op: DEQUANTIZE
Hi @kalaluthien,
Thanks for the well thought out and detailed bug report. Sorry for the delay in getting back - there's generally limited time I take out each week to look at github issues :)
I tried out both these example and they converted+ran just fine for me. Perhaps, there is a flag you are missing during conversion. Check this file for conversion code.
Looking at your colab code, converter._experimental_new_quantizer = True
is missing. Please try that and let me know how it goes.
This failure is expected. By default, our goal is to support built-in keras layers which is basically layers under the tf.keras.layers
module. TensorflowOpLayer can be used to wrap any TF op, and it's not feasible to meaningfully cover any tf op.
The recommended approach here is to use built-in Keras layers to achieve this. So you can use tf.keras.layers.Add
and tf.keras.layers.Reshape
instead of using +
and expand_dims
. That should solve it.
If you really do want to use something else, it's the user's responsibility to provide an appropriate QuantizeConfig
for your use.
This is again the same problem as TensorFlowOpLayer. And yes, you are right the existing pattern only matches Conv+BN+ReLU. The code likely became slow since it had added a bunch of Quant/Dequant ops in between. I don't think the converter is likely to match Conv/BN/(Add+Mul ops matching hardswish) either while folding.
The proper fix here would be to add support for hardswish. Can you please file a separate bug for that requesting HardSwish support. I'll take some time out to add it. We covered MobileNet v1/v2, so this is currently missing.
But we should be able to add support for this. We also need to ensure the converter is handling it properly.
Again, this works for me. That's how we got MobileNetV2 working and created the results. Perhaps, this is the same issue as Averge/MaxPooling. Please try the fix out and let me know if that works.
Regarding MobileNetV2 reproduction, looking at your code it seems you are training on CIFAR. It won't be as straight-forward to reproduce the full training.
We trained a Keras MobileNet V2 model with hyperparams from this. We then quantized the model and trained again for a few epochs.
I think the reason your conversion code is failing is due to
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
Try removing it, and I think conversion should work. If it doesn't, please let me know. Basically, the QAT conversion by default uses Float inputs/outputs based on the model signature. There is work in progress in TFLiteConverterV2
to support a different model interface int8/uint8
etc.
See this.
Hope this helps.
Also, regarding HardSwish if you have the time and are interested, I'm happy to guide you in how to implement support for it :)
Thanks for great replies!
Hi, @nutsiepully. I've tested above questions on same environment. (tf==2.2.0 + tfmot==0.3.0)
converter._experimental_new_quantizer = True
is already in my code, so there will be another reason. gist linkCode snippet:
for model in models:
print(f'Convert "{model.name}"')
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT] # for post-training quantization
converter.representative_dataset = calibration_gen # for full-integer quantization
converter._experimental_new_quantizer = True # already here!
with quantize.quantize_scope(): # is this right place for opening quantize_scope?
tflite_model = converter.convert()
tflite_models.append((tflite_model, model.name))
...
for model, name in tflite_models:
try:
interpreter = tf.lite.Interpreter(model_content=model)
interpreter.allocate_tensors()
Error messages:
[TFLite] MnistAveragePooling2D error:
tensorflow/lite/kernels/pooling.cc:94 input->params.scale != output->params.scale (-2099980912 != 666249888)Node number 2 (AVERAGE_POOL_2D) failed to prepare.
[TFLite] MnistMaxPooling2D error:
tensorflow/lite/kernels/pooling.cc:94 input->params.scale != output->params.scale (-2099980912 != 666250000)Node number 2 (MAX_POOL_2D) failed to prepare.
[TFLite] MnistDenseAndGAP error:
tensorflow/lite/kernels/kernel_util.cc:129 std::abs(input_product_scale - bias_scale) <= 1e-6 * std::min(input_product_scale, bias_scale) was not true.Node number 4 (FULLY_CONNECTED) failed to prepare.
DEQUANTIZE
error message. Now it breaks at preparation step of fully-connected layer. gist linkCode snippet:
converter.optimizations = [tf.lite.Optimize.DEFAULT]
#converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8] <- commented out
#converter.experimental_new_converter = True <- commented out because `True` is default
converter._experimental_new_quantizer = True <- add '_' in front of variable name
dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train)).batch(1)
representative_dataset_gen = get_representative_dataset(dataset, num_calibration_steps=100)
converter.representative_dataset = tf.lite.RepresentativeDataset(representative_dataset_gen)
quantized_tflite_model = converter.convert()
...
interpreter = tf.lite.Interpreter(model_content=quantized_tflite_model)
interpreter.allocate_tensors()
Error message:
RuntimeError: tensorflow/lite/kernels/kernel_util.cc:129 std::abs(input_product_scale - bias_scale) <= 1e-6 * std::min(input_product_scale, bias_scale) was not true.
Node number 69 (FULLY_CONNECTED) failed to prepare.
This error is same error with Q1, which seems to bug on GAP+Dense combination.
Add
works! But for the addition, I used tf.add()
and tf.multiply()
because I need to add/multiply constant values element-wise to dynamic vector which has None-valued batch dimension.
Is there another workaround to add constant values without using tf.add()
and tf.lambda()
?
In short:
x = tf.keras.Input(...)
shortcut = x
x = tf.keras.layer.Relu(6.0)(x + 3.0) * 1.67 # any equivalent options when using only keras builtins?
x = tf.keras.layer.Multiply([x, shortcut])
I'll be great if I can contribute to support h-swish, and then we can benchmark MobileNetV3.
Thanks!
Oh I'm sorry I made a mistake. I meant use
converter.experimental_new_converter = True
That's what was missing.
Can you try using tf-nightly
do run your code? There might be some converter changes that aren't in TF 2.2.
I've successfully converted Dense+BN
etc. I think if you use tf-nightly
with experimental_new_converter=True
, the conversion errors should go away.
As for HardSwish, I just looked into it a bit. There seem to be a few tricky pieces.
For starters, hard_swish
has not been added as an activation in Keras yet. The goal of the tfmot
library is to provide default behavior for all built-in Keras layers/activations. But since hard_swish
is not a built-in activation yet, we can't really add a pattern matching it in the library code. It would need to be handled by the user.
I would recommend adding support for it in your code to begin with. Once hard_swish
gets added, we can move this code internally. You should be able to file a bug on keras/tf to check whether they plan to add support for it.
You can create a class HardSwish(Layer)
which gets added after Conv + BN
. You should be able to use built-in Add
and Multiply
to do so.
Next, to understand exactly what support needs to be added, we would need to understand how it executes in TFLite.
I created a simple model.
inp = tf.keras.Input(shape=(28, 28, 1))
x = tf.keras.layers.Conv2D(32, 5)(inp)
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layer.Relu(6.0)(x + 3.0) * 1.67
m = tf.keras.Model(inp, x)
m.save('hswish.h5')
Converted it using the following code.
conv = tf.lite.TFLiteConverter.from_keras_model(m)
def representative_dataset_gen():
for _ in range(num_calibration_steps):
yield np.random.rand(28, 28, 1)
conv.representative_dataset = representative_dataset_gen
conv.convert()
# saved as hswish.tflite
As you can see the converter fuses the Add into the bias, but the Mul comes after.
So you'll need to place the FakeQuant
after the Add + ReLU
but before the Mul. And likely use a transform similar to this. That should sort the issue out.
Thanks for your advises.
tf-nightly (2.3.0) solves every problem! (except input/output type after conversion: https://github.com/tensorflow/tensorflow/issues/38285, which is independent of these)
You mean I need to place the FakeQuant
like: Conv2D-BN-[FakeQuant]-Add-ReLU-Mul-[FakeQuant]-OtherLayers
with customized transform using tfmot, and let TfLiteConverter to convert { Conv2D-BN: FusedConv2D, Add-ReLU-Mul: HardSwish }
?
Glad to know.
It will be Conv2D -> BN -> Add -> ReLU -> [FakeQuant] -> Mul
. So the converter will fuse the BN + Add + ReLU into the Conv and the Mul will remain separate since it's after thee ReLU.
Seems like this bug is solved. I'm closing it, please feel free to reopen.
You can start a new issue for the hardswish and we can continue our conversation there. Even if it's done in your code, it can remain an example for others to follow.
And it'll be pretty easy to incorporate into the library once you've implemented it. We can try and get HardSwish moved into Keras.
Thanks @kalaluthien for your patience and proactive use of the library.
Hi, I have meet the same problems as this issue, but I still could not solve them. My environment is tf-nightly 2.3.0.dev20200522 and python3.8
x = layers.ReLU(6.0)(x + 3) * (1 / 6)
when I want to quantize the op '+' and '*', it fails and show this:
RuntimeError: Layer tf_op_layer_AddV2:<class 'tensorflow.python.keras.engine.base_layer.TensorFlowOpLayer'> is not supported. You can quantize this layer by passing a
tfmot.quantization.keras.QuantizeConfiginstance to the
quantize_annotate_layerAPI.
If I give the QuantizeConfig, maybe I could solve it. However, I have read the guide in https://www.tensorflow.org/model_optimization/guide/quantization/training_comprehensive_guide, I still don't know how to write the QuantizeConfig for tf.add and tf.multiply. I hope you could tell me how to write the QuantizeConfig for tf.add and tf.multiply or give me some more examples.
Thank you very much!
Hello, is there any update? I'm trying to solve this problem too but haven't managed to yet.
Moreover, I'm very confused because hard-swish is defined here as x = tf.keras.layer.Relu(6.0)(x + 3.0) * 1.67
.
Actually, it is defined as x = x * tf.keras.layer.Relu(6.0)(x + 3.0) * 1.67
in this paper.
Therefore, I think the converter does not fuse Add into the bias as @nutsiepully statet.
Please see the code and pictures below.
inp = tf.keras.Input(shape=(28, 28, 1))
x = tf.keras.layers.Conv2D(32, 5)(inp)
x = tf.keras.layers.BatchNormalization()(x)
x = x * tf.keras.layers.ReLU(6.0)(x + 3.0) * 1.67
m = tf.keras.Model(inp, x)
m.save('hswish.h5')
conv = tf.lite.TFLiteConverter.from_keras_model(m)
def representative_dataset_gen():
for _ in range(num_calibration_steps):
yield np.random.rand(28, 28, 1)
conv.representative_dataset = representative_dataset_gen
conv.convert()
# saved as hswish.tflite
Do I miss something?
@yfthu You can replace tf.add
and tf.multiply
with tf.keras.layers.Add
and tf.keras.layers.Multiply
. But, you still need a config for tf.keras.layers.Multiply
, the config below worked for me, please note I did int8 qunatization, probably for some cases you need keep there int32.
class MultQuantizeConfig(tfmot.quantization.keras.QuantizeConfig):
# Configure how to quantize weights.
def get_weights_and_quantizers(self, layer):
return []
# Configure how to quantize activations.
def get_activations_and_quantizers(self, layer):
return []
def set_quantize_weights(self, layer, quantize_weights):
pass
def set_quantize_activations(self, layer, quantize_activations):
pass
# Configure how to quantize outputs (may be equivalent to activations).
def get_output_quantizers(self, layer):
return [
tfmot.quantization.keras.quantizers.MovingAverageQuantizer(
num_bits=8, symmetric=False, narrow_range=False, per_axis=False)]
def get_config(self):
return {}
Basically, the tf.keras.layers.Multiply
is quite simple, no weights and activations, so you don't need to add any quantizations thehere, but the output may be different depending on the inputs, so this is the only place where quantization should be added.
But, you can probably omit output quantization too (just return empty list), if input parameters are already qunatized, in such case, I think, the multiplication operation should happen in the precision of parameters (which is usually int8 or int32).
Also, please note, I'm using: MovingAverageQuantizer
which should adjust min/max based on average, you may replace with AllValuesQuantizer
and see how it affects precision in your case.
Describe the requests I am working with recent neural networks targeting mobile devices, and I found there are obstacles to perform integer-quantization after QAT.
I know these APIs are not available now, but if you have plans to address following issues, please let me know when they will be available :)
AveragePooling2D
MaxPooling2D
Residual connection
HardSwish
GlobalAveragePooling-Dense
System information
TensorFlow installed from (source or binary): binary
TensorFlow version: 2.2.0 (release)
TensorFlow Model Optimization version: 0.3.0 (release)
Python version: 3.6.0
Code to reproduce the issue Gist to reproduce full test https://gist.github.com/kalaluthien/b270c71afb6866ae61ef0dc088a762f2