yinguobing / facial-landmark-detection-hrnet

A TensorFlow implementation of HRNet for facial landmark detection.
GNU General Public License v3.0
157 stars 40 forks source link

edge TPU版本 #10

Open Yakuho opened 3 years ago

Yakuho commented 3 years ago

作者您好,我使用你的开源代码去运行quantization.py,成功生成了5个tflite版本,我选取全整型的hrnet_quant_int_only.tflite,在ubuntu上用edgetpu_compiler optimized/hrnet_quant_int_only.tflite进行编译,尝试使用TPU运行模型。但是我得到了这样的报错

Yakuho commented 3 years ago
Edge TPU Compiler version 15.0.340273435
Invalid model: optimized/hrnet_quant_int_only.tflite
Model not quantized
Yakuho commented 3 years ago

我注意到在该hrnet_quant_int_only.tflite中,例如Conv2D层中,bias貌似还是float32类型

yinguobing commented 3 years ago

我看了下官方文档:https://coral.ai/docs/edgetpu/models-intro/#model-requirements

使用EdgeTPU需要模型满足如下条件:

这些条件相对严苛,我在构建模型的时候没有考虑,所以有可能有多处不满足要求。

例如EdgeTPU支持以下OP:https://coral.ai/docs/edgetpu/models-intro/#supported-operations

但是HRNet使用了bilinear的up-sampling,初步看来不在该列表中。

目前看来,要在EdgeTPU上运行,需要对模型与导出过程做出大量修改与适配。

Yakuho commented 3 years ago

是的,我也是注意到您在QAT中使用了up-sampling2D,请问你在quantization中,您是使用了自定义量化某些层来实现的吗,因为我注意到模型里面并非全部都是int8类型,尽管是使用了

    if mode["IntergerOnly"]:
        converter.representative_dataset = representative_dataset
        converter.target_spec.supported_ops = [
            tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
        converter.inference_input_type = tf.int8  # or tf.uint8
        converter.inference_output_type = tf.int8  # or tf.uint8

做转换

yinguobing commented 3 years ago

我记得HRNet构建时全部基于Functional API,没有自定义layer,也没有自定义量化的过程。如果有float的类型的确是一件奇怪的事情。也许是TFLite转换时出了差错?

Yakuho commented 3 years ago

的确出现了float32的类型呢,我用的是您默认的转换脚本进行的int8转换, 我注意到你使用的转换代码和TensorFlow官方是一样的: https://www.tensorflow.org/lite/performance/post_training_integer_quant?hl=zh-cn image

yinguobing commented 3 years ago

我发现官方示例中使用的是uint8,我使用的是int8。你可以再试一下吗?我的开发PC已经打包了,暂时没有办法调试。

Yakuho commented 3 years ago

当我运行hrnet_quant_int_only.tflite的时候,奇怪的事情出现了,返回的数值居然是Float的,难道是我的姿势错了吗? :-(

interpreter = tf.lite.Interpreter('./optimized/hrnet_quant_int_only.tflite')
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

image = cv2.imread('./docs/face.jpg')
image = cv2.resize(image, (256, 256))
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
image = image.astype(np.int8)
image = np.expand_dims(image, axis=0)

interpreter.set_tensor(input_details[0]['index'], image)
interpreter.invoke()
output_data = interpreter.get_tensor(output_details[0]['index'])
Traceback (most recent call last):
  File "D:/Python_code/work/facial-marks/facial-landmark-detection-hrnet/test.py", line 22, in <module>
    interpreter.set_tensor(input_details[0]['index'], image)
  File "D:\python\lib\site-packages\tensorflow\lite\python\interpreter.py", line 423, in set_tensor
    self._interpreter.SetTensor(tensor_index, value)
ValueError: Cannot set tensor: Got value of type INT8 but expected type FLOAT32 for input 0, name: serving_default_input_1:0 

您有遇到过类似的问题吗??

yinguobing commented 3 years ago

量化模型在测试时没有报错,但是现在回忆我当时并没有检查输入的类型,有可能当时是float32输入所以没有遇到你说的错误。

我觉得应该是模型转换出了问题,但是具体问题在哪里暂时不清楚。

Yakuho commented 3 years ago

我已经尝试使用uint8进行量化了,还是不行的。我发现我用的是HELEN的数据集进行的量化,我注意到HELEN的landmark点是68,WFLW的关键点是98,这可能是导致转换错误的原因?

yinguobing commented 3 years ago

我认为与数据集关系不大。训练后的模型已经定型,不受mark点个数的影响。量化时仅需要图像,与mark点也没有关系。

这个问题现在比较复杂。从截图来看转换后的模型参数包含浮点数,这个不应该。也许我们应该关注下TFLite转换时的log,看能否发现有用的信息。

Yakuho commented 3 years ago

是的,我刚刚实践过证明确实不是这个问题。这个是转换时候的log,我的确注意到一个可疑点,在log的最后(它貌似显示已经跳过量化):

2021-04-07 17:30:18.483685: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:316] Ignored output_format.
2021-04-07 17:30:18.483728: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:319] Ignored drop_control_dependency.
2021-04-07 17:30:18.483736: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:325] Ignored change_concat_input_ranges.
2021-04-07 17:30:18.484557: I tensorflow/cc/saved_model/reader.cc:32] Reading SavedModel from: ./exported/hrnetv2
2021-04-07 17:30:18.683818: I tensorflow/cc/saved_model/reader.cc:55] Reading meta graph with tags { serve }
2021-04-07 17:30:18.683871: I tensorflow/cc/saved_model/reader.cc:93] Reading SavedModel debug info (if present) from: ./exported/hrnetv2
2021-04-07 17:30:18.683937: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-04-07 17:30:18.683957: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-04-07 17:30:18.683963: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]
2021-04-07 17:30:19.294380: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:196] None of the MLIR optimization passes are enabled (registered 0 passes)
2021-04-07 17:30:19.514991: I tensorflow/cc/saved_model/loader.cc:206] Restoring SavedModel bundle.
2021-04-07 17:30:19.638277: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 3600000000 Hz
2021-04-07 17:30:26.972290: I tensorflow/cc/saved_model/loader.cc:190] Running initialization op on SavedModel bundle at path: ./exported/hrnetv2
2021-04-07 17:30:27.930842: I tensorflow/cc/saved_model/loader.cc:277] SavedModel load for tags { serve }; Status: success: OK. Took 9446286 microseconds.
2021-04-07 17:30:31.140513: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:194] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2021-04-07 17:30:33.857235: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-04-07 17:30:33.857861: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:65:00.0 name: GeForce RTX 2080 computeCapability: 7.5
coreClock: 1.71GHz coreCount: 46 deviceMemorySize: 7.79GiB deviceMemoryBandwidth: 417.23GiB/s
2021-04-07 17:30:33.858147: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-04-07 17:30:33.858163: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-04-07 17:30:33.858173: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-04-07 17:30:33.858291: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2021-04-07 17:30:33.999442: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-04-07 17:30:33.999485: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]      0
2021-04-07 17:30:33.999494: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0:   N
2021-04-07 17:30:37.221383: I tensorflow/lite/tools/optimize/quantize_weights.cc:222] Skipping quantization of tensor std.constant39 because it has fewer than 1024 elements (648).
2021-04-07 17:30:37.221483: I tensorflow/lite/tools/optimize/quantize_weights.cc:222] Skipping quantization of tensor std.constant117 because it has fewer than 1024 elements (648).
2021-04-07 17:30:37.221541: I tensorflow/lite/tools/optimize/quantize_weights.cc:222] Skipping quantization of tensor std.constant202 because it has fewer than 1024 elements (648).
zye1996 commented 3 years ago

看起来实际模型并没有量化 “量化”后每一层参数中找不到对应量化因子

zye1996 commented 3 years ago

是的,我刚刚实践过证明确实不是这个问题。这个是转换时候的log,我的确注意到一个可疑点,在log的最后(它貌似显示已经跳过量化):

2021-04-07 17:30:18.483685: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:316] Ignored output_format.
2021-04-07 17:30:18.483728: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:319] Ignored drop_control_dependency.
2021-04-07 17:30:18.483736: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:325] Ignored change_concat_input_ranges.
2021-04-07 17:30:18.484557: I tensorflow/cc/saved_model/reader.cc:32] Reading SavedModel from: ./exported/hrnetv2
2021-04-07 17:30:18.683818: I tensorflow/cc/saved_model/reader.cc:55] Reading meta graph with tags { serve }
2021-04-07 17:30:18.683871: I tensorflow/cc/saved_model/reader.cc:93] Reading SavedModel debug info (if present) from: ./exported/hrnetv2
2021-04-07 17:30:18.683937: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-04-07 17:30:18.683957: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-04-07 17:30:18.683963: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]
2021-04-07 17:30:19.294380: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:196] None of the MLIR optimization passes are enabled (registered 0 passes)
2021-04-07 17:30:19.514991: I tensorflow/cc/saved_model/loader.cc:206] Restoring SavedModel bundle.
2021-04-07 17:30:19.638277: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 3600000000 Hz
2021-04-07 17:30:26.972290: I tensorflow/cc/saved_model/loader.cc:190] Running initialization op on SavedModel bundle at path: ./exported/hrnetv2
2021-04-07 17:30:27.930842: I tensorflow/cc/saved_model/loader.cc:277] SavedModel load for tags { serve }; Status: success: OK. Took 9446286 microseconds.
2021-04-07 17:30:31.140513: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:194] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2021-04-07 17:30:33.857235: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-04-07 17:30:33.857861: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:65:00.0 name: GeForce RTX 2080 computeCapability: 7.5
coreClock: 1.71GHz coreCount: 46 deviceMemorySize: 7.79GiB deviceMemoryBandwidth: 417.23GiB/s
2021-04-07 17:30:33.858147: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-04-07 17:30:33.858163: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-04-07 17:30:33.858173: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-04-07 17:30:33.858291: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2021-04-07 17:30:33.999442: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-04-07 17:30:33.999485: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]      0
2021-04-07 17:30:33.999494: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0:   N
2021-04-07 17:30:37.221383: I tensorflow/lite/tools/optimize/quantize_weights.cc:222] Skipping quantization of tensor std.constant39 because it has fewer than 1024 elements (648).
2021-04-07 17:30:37.221483: I tensorflow/lite/tools/optimize/quantize_weights.cc:222] Skipping quantization of tensor std.constant117 because it has fewer than 1024 elements (648).
2021-04-07 17:30:37.221541: I tensorflow/lite/tools/optimize/quantize_weights.cc:222] Skipping quantization of tensor std.constant202 because it has fewer than 1024 elements (648).

这里实现了一个EdgeTPU上量化后的模型推理,你可以看一下