MLIR-based PTQ have MEAN ops with with different quantization scales for inputs and outputs.

freedomtan commented 3 years ago

1. System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): platform agnostic
TensorFlow installation (pip package or built from source): when MLIR-based post-training quantization is used
TensorFlow library (version, if pip package or github SHA, if built from source): when MLIR-based post-training quantization is used

2. Code

Post-training quantization (PTQ) may result in MEAN with different quantization scales for inputs and outputs. Because NNAPI only supports MEAN with same quantization parameters for inputs and output, this kind of mean ops could not be delegated to NNAPI via the NNAPI delegate.

E.g., if we do PTQ on the ResNet 50 from tfhub,

import itertools
import os
import pathlib
import requests
import tarfile

import tensorflow as tf
import tensorflow_datasets as tfds

saved_model_dir = "./resnet_saved_model/"
resnet_saved_model_file = "https://tfhub.dev/tensorflow/resnet_50/classification/1?tf-hub-format=compressed"
response = requests.get(resnet_saved_model_file, stream=True)
file = tarfile.open(fileobj=response.raw, mode="r|gz")
file.extractall(path=saved_model_dir)

imagenet_validation = tfds.load(name="imagenet2012", split="validation")

def representative_data_gen():
  for imagenet_example in imagenet_validation.take(100):
    image, label = imagenet_example["image"], imagenet_example["label"]
    image = tf.cast(image, tf.float32) / 255.0
    image = tf.image.central_crop(image, central_fraction=0.875)
    image = tf.expand_dims(image, 0)
    image = tf.image.resize(image, (224,224))
    yield [image]

converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_data_gen
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8 
converter.inference_output_type = tf.int8 
tflite_quant_model = converter.convert()

tflite_quant_model_file = pathlib.Path('/tmp')/"resnet50_quant.tflite"
tflite_quant_model_file.write_bytes(tflite_quant_model)

then adb push /tmp/resnet50_quant.tflite /data/local/tmp/, and run benchmark_model on device (here I ran it on a Pixel 4)

$ ./benchmark_model_validation --graph=resnet50_qnant.tflite   --use_nnapi=1 --enable_op_profiling=1

It shows something like:

STARTING!
Log parameter values verbosely: [0]
Graph: [resnet_v15_quant_original_mean.tflite]
Enable op profiling: [1]
Use NNAPI: [1]
NNAPI accelerators available: [qti-default,qti-dsp,qti-gpu,google-edgetpu,nnapi-reference]
Loaded model resnet50_quant.tflite
INFO: Initialized TensorFlow Lite runtime.
INFO: Created TensorFlow Lite delegate for NNAPI.
NNAPI delegate created.
WARNING: Operator MEAN (v2) refused by NNAPI delegate: NNAPI requires that the input and output have the same quantization parameters.
INFO: Replacing 75 node(s) with delegate (TfLiteNnapiDelegate) node, yielding 3 partitions.
Explicitly applied NNAPI delegate, and the model graph will be partially executed by the delegate w/ 2 delegate kernels.
.......
Operator-wise Profiling Info for Regular Benchmark Runs:
============================== Run Order ==============================
                 [node type]              [start]     [first]    [avg ms]        [%]      [cdf%]      [mem KB]  [times called]  [Name]
         TfLiteNnapiDelegate                0.009      14.431      14.449    48.545%     48.545%         0.000          1   [resnet50/activation_48/Relu;resnet50/add_15/add]:76
                        MEAN               14.460      13.399      13.501    45.359%     93.904%         0.000          1   [resnet50/reduce_mean/Mean]:73
         TfLiteNnapiDelegate               27.962       1.754       1.814     6.096%    100.000%         0.000          1   [StatefulPartitionedCall:0]:77
.......
Number of nodes executed: 3
============================== Summary by node type ==============================
                 [Node type]      [count]     [avg ms]      [avg %]     [cdf %]   [mem KB]  [times called]
         TfLiteNnapiDelegate            2       16.263      54.642%     54.642%      0.000          2
                        MEAN            1       13.500      45.358%    100.000%      0.000          1

Timings (microseconds): count=50 first=29584 curr=22585 min=22585 max=33233 avg=29764 std=1779
Memory (bytes): count=0
3 nodes observe

3. Possible solutions

a. force MEAN to have same scale for inputs and outputs like what I did at https://github.com/tensorflow/tensorflow/pull/51373

With this constraint applied, for the same ResNet 50 model, I got something like the following:

....
Number of nodes executed: 1
============================== Summary by node type ==============================
                 [Node type]      [count]     [avg ms]      [avg %]     [cdf %]   [mem KB]  [times called]
         TfLiteNnapiDelegate            1       14.762     100.000%    100.000%      0.000          1

Timings (microseconds): count=67 first=14774 curr=15087 min=13959 max=15226 avg=14762.4 std=248
Memory (bytes): count=0
1 nodes observed

However, as discussed in https://github.com/tensorflow/tensorflow/pull/51373, this may run into problems

b. extending NNAPI MEAN op to support different quantization parameters

@miaowang14 is this possible?

c. if modifying original model is viable, as @renjie-liu suggested in https://github.com/tensorflow/tensorflow/pull/51373, replacing MEAN with reduce_sum or global avg_pool is mathematically equivalent.

Maybe this could an option in converter or quantizer?

abattery commented 3 years ago

@renjie-liu could you take a look at this? This is a good example case where the quantization + NNAPI is involved.

renjie-liu commented 3 years ago

Hi Freedom, do you have a tflite model we can debug internally? thanks!

freedomtan commented 3 years ago

@renjie-liu you can find the PTQ resnet50 tflite generated by the aforementioned PTQ script at https://drive.google.com/file/d/1af1ucoBg4zQ0KJdcyEfVSWyqcXYJdOym/view?usp=sharing

renjie-liu commented 3 years ago

Hi Freedom, https://github.com/tensorflow/tensorflow/commit/8c90a182b8e200a870ddc5dc4fb9d7ee10f04cbe

Can you try bazel run //tensorflow/compiler/mlir/lite/experimental/tac:tac-translate -- <INPUT_MODEL> -o=<OUTPUT_MODEL> --device-specs=NNAPI

And see if it works for you case?

Note we are replacing the mean with a avg_pool -> requantize, that may cause accuracy issue so it would be great to get your help to validate the model.

Thanks!

freedomtan commented 3 years ago

@renjie-liu Thanks. So far, it works for all the models I tested functionally. I'll check accuracy later.

freedomtan commented 3 years ago

FYI. Yes, there is accuracy loss, but seems to be acceptable for image classification. For this resnet case, verifying with TFLite image classification evaluation tool. The one with MEAN:

Top-1 Accuracy: 0.76332
Top-2 Accuracy: 0.86308
Top-3 Accuracy: 0.8974
Top-4 Accuracy: 0.91452
Top-5 Accuracy: 0.92644
Top-6 Accuracy: 0.93486
Top-7 Accuracy: 0.9403
Top-8 Accuracy: 0.94442
Top-9 Accuracy: 0.94764
Top-10 Accuracy: 0.95084

For MEAN replaced with avg_pool + requant

Top-1 Accuracy: 0.76286
Top-2 Accuracy: 0.8619
Top-3 Accuracy: 0.89688
Top-4 Accuracy: 0.91464
Top-5 Accuracy: 0.92534
Top-6 Accuracy: 0.9334
Top-7 Accuracy: 0.93922
Top-8 Accuracy: 0.94358
Top-9 Accuracy: 0.94674
Top-10 Accuracy: 0.94996

tensorflow / model-optimization