Infrastructure to enable deployment of ML models to low-power resource-constrained embedded targets (including microcontrollers and digital signal processors).
The TFLiteConverter recently switched over to using per-channel quantization for all Dense/FullyConnected layers. TFLite-Micro does not yet have support for this, and was using incorrect quantization parameters for FullyConnected layers on newly converted models. Unsurprisingly, this leads to invalid output.
While we intend to add per-channel quantization support for FullyConnected, this PR adds a runtime check for per-channel quantization until it can be supported by individual kernels. If you encounter this runtime error, you can disable the new Converter behavior by setting:
The TFLiteConverter recently switched over to using per-channel quantization for all Dense/FullyConnected layers. TFLite-Micro does not yet have support for this, and was using incorrect quantization parameters for FullyConnected layers on newly converted models. Unsurprisingly, this leads to invalid output.
While we intend to add per-channel quantization support for FullyConnected, this PR adds a runtime check for per-channel quantization until it can be supported by individual kernels. If you encounter this runtime error, you can disable the new Converter behavior by setting:
TfLiteConverter._experimental_disable_per_channel_quantization_for_dense_layers = True
https://github.com/tensorflow/tensorflow/blob/377f47694fa790e98db6665b9adecde00b5e0d68/tensorflow/lite/python/lite.py#L674BUG=b/324385802