Open CvHadesSun opened 3 years ago
Hi @CvHadesSun ,
Seems mostly working as intended. lmk if I get you wrong.
Thx your reply @teijeong : 1.first i understand the model size issue, thank you.
Can you try setting converter.inference_type = tf.uint8
? (and converter.inference_input_type
and converter.inference_output_type
if needed)
Thx reply. 1.Now , I am sure my NPU only support uint8 nnapi delegate, and I try setting converter.inference_type = tf.uint8 and inference_input_type and inference_output_type are tf.uint8, but I quantize all SavedModel format keras model , the model weights are always int8 ,not unit8, and quantization code and log are:
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_data_gen
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8] # or converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS]
converter.inference_type = tf.uint8
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
tflite_quant_model = converter.convert()
log:
2021-08-05 16:16:53.576560: I tensorflow/cc/saved_model/reader.cc:100] Reading SavedModel from: ../../weights/mnist
2021-08-05 16:16:53.587798: I tensorflow/cc/saved_model/reader.cc:71] Reading meta graph with tags { serve }
2021-08-05 16:16:53.587821: I tensorflow/cc/saved_model/reader.cc:144] Reading SavedModel debug info (if present) from: ../../weights/mnist
2021-08-05 16:16:53.635796: I tensorflow/cc/saved_model/loader.cc:210] Restoring SavedModel bundle.
2021-08-05 16:16:53.995971: I tensorflow/cc/saved_model/loader.cc:194] Running initialization op on SavedModel bundle at path: ../../weights/mnist
2021-08-05 16:16:54.038379: I tensorflow/cc/saved_model/loader.cc:283] SavedModel load for tags { serve }; Status: success: OK. Took 462029 microseconds.
2021-08-05 16:16:54.197794: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:210] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
fully_quantize: 0, inference_type: 6, input_inference_type: 3, output_inference_type: 3
WARNING:absl:For model inputs containing unsupported operations which cannot be quantized, the `inference_input_type` attribute will default to the original type.
import tensorflow as tf
converter = tf.compat.v1.lite.TFLiteConverter.from_frozen_graph( graph_def_file='/path/to/mobilenet_v1_1.0_224/frozen_graph.pb', input_arrays=['input'], input_shapes={'input' : [1, 224, 224,3]}, output_arrays=['MobilenetV1/Predictions/Softmax'], ) converter.quantized_input_stats = {'input' : (0., 1.)} # mean, std_dev (input range is [-1, 1]) converter.inference_type = tf.int8 # this is the recommended type.
tflite_model = converter.convert()
with open('quantized_model.tflite', 'wb') as f: f.write(tflite_model)
3.Any other methods to solve my question?:)
@teijeong I found there is a note on https://www.tensorflow.org/lite/performance/quantization_spec stating uint8 is for old tools, do you happen to know which version did the change happen and where can we find the old documentation? Thanks
Note: In the past our quantization tooling used per-tensor, asymmetric, uint8 quantization. New tooling, reference kernels, and optimized kernels for 8-bit quantization will use this spec.
In the TF2 converter:
inference_type
flag doesn't exist (it is ignored if the user defines it). INT8
(or int8
) quantization type and we don't support QUANTIZED_UINT8
(or uint8
) quantization type. The reason it was removed is listed here: https://github.com/tensorflow/tensorflow/issues/38285#issuecomment-635533037Is your model trained in TF1? If yes, you can convert and uint8
quantize your TF1 SavedModel model in TF2 as follows:
import tensorflow as tf
converter = tf.compat.v1.lite.TFLiteConverter.from_saved_model(saved_model_dir,
input_arrays=['input'],
input_shapes={'input' : [1, 224, 224,3]},
output_arrays=['MobilenetV1/Predictions/Softmax']
)
converter.quantized_input_stats = {'input' : (0., 1.)} # mean, std_dev (input range is [-1, 1])
converter.inference_type = tf.int8 # this is the recommended type.
# converter.inference_input_type=tf.uint8 # optional
# converter.inference_output_type=tf.uint8 # optional
tflite_model = converter.convert()
# Save the model.
with open('quantized_model.tflite', 'wb') as f:
f.write(tflite_model)
@MeghnaNatraj , I trying your advice, modeling the tf1.13.1 model , and get the uint8 quantized tflite model, thank you.
And tf1.x quantization api need converter.default_ranges_stats
, if I don't take the Quantization-aware training method, is there another way to find the default_ranges_stats(min and max value)?
Thx.
You don’t need the ‘default_range_stats’ flag for quantization. It’s an optional field that we discourage users from using if possible.
Is there anything missing in your model? Are you are looking to modify it further - and in what way?
On Tue, Aug 10, 2021 at 7:52 PM CvHadesSun @.***> wrote:
@MeghnaNatraj https://github.com/MeghnaNatraj , I trying your advice, modeling the tf1.13.1 model , and get the uint8 quantized tflite model, thank you. And tf1.x quantization api need converter.default_ranges_stats, if I don't take the Quantization-aware training method, is there another way to find the default_ranges_stats(min and max value)? Thx.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/model-optimization/issues/775#issuecomment-896459291, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACDCGJ3BDBRWOOJYPMW7UILT4HQYHANCNFSM5BQ4WSMA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .
-- Thank you, Meghna Natraj | Software Engineer | Tensorflow Lite | @.***
Hi, I am trying to quantize my own tf-keras model for NNAPI delegate. First the origin tf-keras model size is 5.6MB,and quantized int8-tflite model size is about 1.8M, that is not 4X times. And more important is : i am trying to run the quantized int8-tflite model on this test benchmark (https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/tools/benchmark), the result is : use_nnapi=true inference time is longer than use_nnapi=false(202883us / 45882.2us). And i test the quantized model in tensorflow hub, which inference time are fine on my mobile device,(3x about time reduce.), but I quantize the original SaveModel format model in tensorflow hub according to Tutorials (https://www.tensorflow.org/lite/performance/post_training_quantization). the result is same as my own quantized int8-tflite model. such as moblenet_v2_130_224: (size:21.6/6.3)(time(nnapi true/false):104323 / 42137.5 ). The other model is the same as mine. Meanwhile , I quantized mnist example, finally get the same result(nnapi=true , inference time is longer than nnapi=false.). and I want to quantize the given example of quantized model and original model, but which are not SaveModel format , so I can not load the model to process. So , are there problems in my quantization process?
System information
Thx.