TFLM inference results abnormal

Unbinilium commented 1 month ago

Hello, recently I encountered an issue when deploying the model from YOLO-World to a device using TFLM. I found that with the same INT8 per-channel quantized TFLite model and using the same image tensor as input, there is a significant discrepancy between the output tensors from TFLM inference and tensorflow.lite.Interpreter.

As shown in the figure, the model has 6 outputs, and the histograms in blue and orange represent the INT8 tensors obtained from tensorflow.lite.Interpreter and TFLM inference, respectively. In the INT8 space, the proportion of inconsistent data exceeds even 1/3.

However, after undergoing complex post-processing, the actual observed result shows only a few pixels' offset in the bounding boxes.

And I modified the flatbuffer of the model to pre-fetch outputs of certain tensors:

It can be observed that errors have already occurred in shallow-level operations. As the network deepens, accumulated errors may lead to inaccuracies in the final results.

Although the hacked logistic implementation in TFLM is different from TFLite (perhaps more, they are stored in the repository with the same name and path, it is likely to be misunderstood if you don't open these files to confirm the implementation), the +/-1 offsets of these results after the convolution makes me feel a little confused.

Is it some mistake in TFLM? If you have any debugging suggestions or solutions, please let me know. Thanks!

Test environments:

tensorflow 2.16.2
tflite-micro https://github.com/tensorflow/tflite-micro/commit/7a0249686f412551634a5058ddd6d2ec3f224203
clang 14.0.0
python 3.10.12

Issues may related to:

https://github.com/tensorflow/tflite-micro/issues/2319

Unbinilium commented 1 month ago

I installed the public version of tflite_micro 0.dev20240715200401 from PyPi and modified the test script from https://github.com/tensorflow/tflite-micro/issues/2319 to reproduce:

import numpy as np

import tflite_micro as tflm
from tflite_micro.python.tflite_micro import runtime
import tensorflow as tf

print(tflm.__version__)
print(tf.__version__)

print("Checking TFLM post-installation...")
tflm.postinstall_check.passed()

print("Loading the model...")
with open("yolo_world.tflite", "rb") as f:
    tflite_model = f.read()

print("Analyzing the model...")
tf.lite.experimental.Analyzer.analyze(model_content=tflite_model)

tfl_interpreter = tf.lite.Interpreter(model_content=tflite_model, experimental_preserve_all_tensors=True)
tfl_interpreter.allocate_tensors()

tflm_interpreter = runtime.Interpreter.from_bytes(
    tflite_model, intrepreter_config=runtime.InterpreterConfig.kPreserveAllTensors
)

tfl_input_details = tfl_interpreter.get_input_details()
tflm_input_details = tflm_interpreter.get_input_details(0)

input_shape = tfl_input_details[0]["shape"]
input_dtype = tfl_input_details[0]["dtype"]

dummy_input = np.random.randint(-127, 128, size=input_shape, dtype=input_dtype)

tfl_interpreter.set_tensor(tfl_input_details[0]["index"], dummy_input)
tflm_interpreter.set_input(dummy_input, 0)

print("Invoking...")
tfl_interpreter.invoke()
tflm_interpreter.invoke()

print("Comparing the results...")
for i in range(0, tfl_interpreter._interpreter.NumTensors(0)):
    print(f"Tensor {i}: {tfl_interpreter.get_tensor_details()[i]['name']}")
    tflm_tensor = tflm_interpreter.GetTensor(i, 0)["tensor_data"]

    try:
        tfl_tensor = tfl_interpreter.get_tensor(i, 0)
    except ValueError:
        print("  TFL: N/A")
        print(f" TFLM: shape={tflm_tensor.shape}, dtype={tflm_tensor.dtype}")
        print("")
        continue

    is_match = np.allclose(tfl_tensor, tflm_tensor, atol=1e-3)
    print(f"  TFL: shape={tfl_tensor.shape}, dtype={tfl_tensor.dtype}")
    print(f" TFLM: shape={tflm_tensor.shape}, dtype={tflm_tensor.dtype}")
    print(f" MATCH: {'YES' if is_match else 'NO'}")
    print("")

Logs: log.txt Model file: yolo_world.tflite.zip

ArmRyan commented 1 month ago

Hey @Unbinilium, if you have access to the non tflite model could you try converting it with the converter._experimental_disable_per_channel_quantization_for_dense_layers = True flag and see if the issue is resolved?

Ah wait, i see tensorflow 2.16, so this might not help..

Unbinilium commented 1 month ago

Hi @ArmRyan, thanks for your suggestion. I'll enable this option and try re-exporting the TFLite model from pth to test again later.

In fact, I initially suspected that this option might be causing the problem too.

https://github.com/tensorflow/tensorflow/blob/377f47694fa790e98db6665b9adecde00b5e0d68/tensorflow/lite/python/lite.py#L674

However, this change was previously tested only on the device side (with the TFLite model compiled via ethos-u-vela), and it didn't seem to help (possibly there are issues with ethos-u-vela as well).

Unbinilium commented 1 month ago

With converter._experimental_disable_per_channel flag True in tensorflow 2.16.1 (which may have same effect as converter._experimental_disable_per_channel_quantization_for_dense_layers = True), the newly converted model still shows wrong results in shallow-level operations at first Conv2D.

...

Tensor 297: tfl.pseudo_qconst251
  TFL: shape=(16, 3, 3, 3), dtype=int8
 TFLM: shape=(16, 3, 3, 3), dtype=int8
 MATCH: YES

Tensor 298: model_94/tf.compat.v1.pad/Pad
  TFL: shape=(1, 322, 322, 3), dtype=int8
 TFLM: shape=(1, 322, 322, 3), dtype=int8
 MATCH: YES

Tensor 299: model_94/tf.math.add/Add;model_94/tf.nn.convolution/convolution;Const_319
  TFL: shape=(1, 160, 160, 16), dtype=int8
 TFLM: shape=(1, 160, 160, 16), dtype=int8
 MATCH: NO

Tensor 300: model_94/tf.math.sigmoid/Sigmoid
  TFL: shape=(1, 160, 160, 16), dtype=int8
 TFLM: shape=(1, 160, 160, 16), dtype=int8
 MATCH: NO

...

Correspondingly, its position in the graph is:

I guess I should per-step debug the TFLM C++ code to see if there's an error when some parameter is taken out of the flatbuffer.

Full log: log_per_tensor_quant.txt Model: yolo_world_disable_per_channel.tflite.zip

Unbinilium commented 1 month ago

In addition, turning off per channel quantization does seem to reduce the accumulation of errors from outputs:

rascani commented 1 month ago

It might be useful to try out the layer_by_layer_debugger script. It can be used to compare the output of each layer against the TFLite output to determine where the output starts to differ. The tool has some rough edges, but seems like it would work well for this use case.

github-actions[bot] commented 1 week ago

"This issue is being marked as stale due to inactivity. Remove label or comment to prevent closure in 5 days."

github-actions[bot] commented 3 days ago

"This issue is being closed because it has been marked as stale for 5 days with no further activity."

tensorflow / tflite-micro

TFLM inference results abnormal #2629