Describe the bug
I need to train QAT (per-tensor) model and then convert it tflite. But I get "folding issue" described here.
System information
TensorFlow version (installed from source or binary): 2.15.0
TensorFlow Model Optimization version (installed from source or binary): 0.8.0
Python version: 3.10.12
Describe the expected behavior
A 1-layer CNN (conv2d+bn+relu) is folded and converted to tflite after QAT in per-tensor mode without splitting computation graph on multiply "Quantize-Dequatize" parts.
Describe the current behavior
After folding a 1-layer CNN (conv2d+bn+relu) the folded layer is unquantized.
Additional context
I tested #552 but in case of a simple 1-layer CNN (see code) there are no custom layers so if statement in _replace function is False and I get the next line.
I see that in keras h5 model BN layer is quantized as per-channel because quantization parameters in both cases are tensors not scalar as it is expected for per-tensor mode.
Describe the bug I need to train QAT (per-tensor) model and then convert it tflite. But I get "folding issue" described here.
System information
TensorFlow version (installed from source or binary): 2.15.0
TensorFlow Model Optimization version (installed from source or binary): 0.8.0
Python version: 3.10.12
Describe the expected behavior
A 1-layer CNN (conv2d+bn+relu) is folded and converted to tflite after QAT in per-tensor mode without splitting computation graph on multiply "Quantize-Dequatize" parts.
Describe the current behavior
After folding a 1-layer CNN (conv2d+bn+relu) the folded layer is unquantized.
Code to reproduce the issue
Screenshots
Additional context I tested #552 but in case of a simple 1-layer CNN (see code) there are no custom layers so
if
statement in _replace function is False and I get the next line. I see that in keras h5 model BN layer is quantized as per-channel because quantization parameters in both cases are tensors not scalar as it is expected for per-tensor mode.