larq / compute-engine

Highly optimized inference engine for Binarized Neural Networks
https://docs.larq.dev/compute-engine
Apache License 2.0
243 stars 35 forks source link

DoReFa quantizer with higher number of MACs/Ops, Grouped convs as custom ops on LCE 0.7.0 #745

Open lluevano opened 2 years ago

lluevano commented 2 years ago

Hello, I have a couple of questions regarding quantizer options for Larq and LCE.

I am designing a BNN using the DoReFa quantizer, however, I noticed a very high number of estimated MACs and Ops when converting the model for ARM64. Changing the quantizer to "ste_sign" dramatically lowered the number of MACs and Ops.

I was wondering if there is a way to use the DoReFa quantizer for training without the serious overhead of operations when converting and running the model for inference in LCE? Is the "ste_sign" quantizer the only viable option for efficient inference?

Thank you for the excellent work and for your attention.

lluevano commented 2 years ago

I noticed some issues with the latest version only (0.7.0) but not the one before (0.6.2). Grouped convolutions (FP or binary) are converted as custom ops in the latest version.

Example: Grouped (g=2) convs converter output:

2022-07-26 13:06:17.469686: W external/org_tensorflow/tensorflow/compiler/mlir/lite/flatbuffer_export.cc:1903] The following operation(s) need TFLite custom op implementation(s): Custom ops: Conv2D Details: tf.Conv2D(tensor<1x32x32x64xf32>, tensor<5x5x32x32xf32>) -> (tensor<1x11x11x32xf32>) : {data_format = "NHWC", dilations = [1, 1, 1, 1], explicit_paddings = [], padding = "SAME", strides = [1, 3, 3, 1], use_cudnn_on_gpu = true} See instructions: https://www.tensorflow.org/lite/guide/ops_custom 2022-07-26 13:06:17.469772: I external/org_tensorflow/tensorflow/compiler/mlir/lite/flatbuffer_export.cc:1963] Estimated count of arithmetic ops: 5792 ops, equivalently 2896 MACs

Estimated count of arithmetic ops: 5792 ops, equivalently 2896 MACs

Quantizer small example (2 qconv layers):

Example with ste_sign mode="weights":

2022-07-26 13:14:57.680246: I external/org_tensorflow/tensorflow/compiler/mlir/lite/flatbuffer_export.cc:1963] Estimated count of arithmetic ops: 1.164 M ops, equivalently 0.582 M MACs

Estimated count of arithmetic ops: 1.164 M ops, equivalently 0.582 M MACs

Changing to DoReFa mode="weights":

2022-07-26 13:16:05.771057: I external/org_tensorflow/tensorflow/compiler/mlir/lite/flatbuffer_export.cc:1963] Estimated count of arithmetic ops: 1.663 M ops, equivalently 0.831 M MACs

Estimated count of arithmetic ops: 1.663 M ops, equivalently 0.831 M MACs

I was able to successfully benchmark my model with DoReFa and grouped convolutions converted on version 0.6.2 with a better-than-expected efficiency but not the one converted with version 0.7.0 I am using Tensorflow 2.8.0 and larq 0.12.2

lgeiger commented 2 years ago

Sorry for the late reply.

I noticed some issues with the latest version only (0.7.0) but not the one before (0.6.2). Grouped convolutions (FP or binary) are converted as custom ops in the latest version.

Unfortunately this was an issue with TensorFlow 2.8 which LCE 0.7.0 uses under the hood. This has been fixed on master since we upgraded to 2.9, but we haven't published a new release with it yet. Sorry about that. For now, I'd recommend sticking with 0.6.2 if grouped convolution support is required.

Is the "ste_sign" quantizer the only viable option for efficient inference?

For binarised convolutions this is recommended for the activation. You can also use custom activation quantisers as well, but to make sure they convert correctly they should be implemented with larq.math.sign which unfortunately is not the case for DoReFa. Regarding weight quantization other quantisers should work fine as long as they binarise to {-1, 1} or {-alpha, alpha}.

I recommend looking at the converted model in Netron to make sure the conversion worked as intended.

lgeiger commented 2 years ago

I noticed some issues with the latest version only (0.7.0) but not the one before (0.6.2). Grouped convolutions (FP or binary) are converted as custom ops in the latest version.

@lluevano sorry for the delay. We just release v0.8.0 including a fix for this. Let me know if that works for you.