tensorflow / models

Models and examples built with TensorFlow
Other
76.94k stars 45.79k forks source link

OD CenterNet-MobileNetV2fpn with kpts TFLite error on OpenGL GPU delegate: No shader implementation for less #10742

Open jaimeAtHumane opened 2 years ago

jaimeAtHumane commented 2 years ago

1. The entire URL of the file you are using

http://download.tensorflow.org/models/object_detection/tf2/20210210/centernet_mobilenetv2fpn_512x512_coco17_kpts.tar.gz

2. Describe the bug

The TFLite OD model included in TF2 Detection-Zoo centernet_mobilenetv2fpn_512x512_coco17_kpts/model.tflite does not work on the GPU V2 (OpenGL) delegate, on Android devices (tested on multiple devices with Android 10 and 12) with TFLite 2.9.0. The main error appears to be No shader implementation for less. It did work with the CPU XNNPack delegate. I have also gotten other models from TFHub to be initialized successfully on the GPU delegate.

I have tried everything under the sun to get this to work on the GPU, but no luck so far, including replacing all tf.less in research/object_detection with tf.greater (greater ops seem to be correctly detected as unsupported by the GPU). Is there a particular version of TFLite with which this model would work on the GPU, a way to replace all tf.less operations, or a maybe a way to restrict operations on the tflite converter? Could this be related to the GPU V2 delegate defaulting to OpenGL instead of OpenCL? At this point any solution would help, even if it is hacky.

Here are the logcat logs:

I tflite  : Initialized TensorFlow Lite runtime.
I tflite  : Created TensorFlow Lite delegate for GPU.
E tflite  : Following operations are not supported by GPU delegate:
E tflite  : ADD: OP is supported, but tensor type/shape isn't compatible.
E tflite  : ARG_MIN: Operation is not supported.
E tflite  : CAST: Not supported Cast case.
E tflite  : CAST: Not supported Cast case. Input type: BOOL and output type: INT32
E tflite  : CAST: Not supported cast case
E tflite  : FLOOR_DIV: OP is supported, but tensor type/shape isn't compatible.
E tflite  : GATHER_ND: Operation is not supported.
E tflite  : GREATER: Not supported logical op case
E tflite  : GREATER_EQUAL: Not supported logical op case.
E tflite  : MUL: OP is supported, but tensor type/shape isn't compatible.
E tflite  : NOT_EQUAL: Not supported logical op case.
E tflite  : PACK: OP is supported, but tensor type/shape isn't compatible.
E tflite  : RESHAPE: OP is supported, but tensor type/shape isn't compatible.
E tflite  : SELECT: Operation is not supported.
E tflite  : STRIDED_SLICE: STRIDED_SLICE supports for 3 or 4 dimensional tensors only.
E tflite  : STRIDED_SLICE: Slice does not support shrink_axis_mask parameter. 
E tflite  : SUB: OP is supported, but tensor type/shape isn't compatible.
E tflite  : SUM: OP is supported, but tensor type/shape isn't compatible.
E tflite  : TILE: OP 
I tflite  : Replacing 120 node(s) with delegate (TfLiteGpuDelegateV2) node, yielding 2 partitions.
E tflite  : Can not open OpenCL library on this device - dlopen failed: library "libOpenCL.so" not found
E libEGL: call to OpenGL ES API with no current context (logged once per thread)
E tflite  : Falling back to OpenGL
I tflite  : Initialized OpenGL-based API.
E tflite  : TfLiteGpuDelegate Init: No shader implementation for less
I tflite  : Created 0 GPU delegate kernels.
E tflite  : TfLiteGpuDelegate Prepare: delegate is not initialized
E tflite  : Node number 277 (TfLiteGpuDelegateV2) failed to prepare.
E tflite  : Restored original execution plan after delegate application failure.

3. Steps to reproduce

Load and run the CenterNet MobileNetV2 with kpts model on the GPU delegate (TfLiteGpuDelegateV2), with the default TfLiteGpuDelegateOptionsV2Default options, using TFLite 2.9.0.

4. Expected behavior

The GPU delegate should be initialized correctly.

5. Additional context

Relevant Adreno information:

I/AdrenoGLES-0: QUALCOMM
    Build Date                       : 12/15/20
    OpenGL ES Shader Compiler Version: EV031.32.02.06
    Local Branch                     : 
    Remote Branch                    : 
    Remote Branch                    : 
    Reconstruct Branch               : 
    Build Config                     : S P 10.0.7 AArch64
    Driver Path                      : /vendor/lib64/egl/libGLESv2_adreno.so

6. System information

NOTE: Here is a related GitHub issue, with a similar error of an op shader missing in the GPU delegate

jaimeAtHumane commented 2 years ago

to follow up on this thread, I tried to run exactly the same model using the TF Android benchmark tool android_aarch64_benchmark_model.apk, with the following command adb shell am start -S -n org.tensorflow.lite.benchmark/.BenchmarkModelActivity --es args '"--graph=/data/local/tmp/model.tflite --use_gpu=true"', and the GPU delegate was initialized correctly. The only difference is that the benchmark tool seems to be loading the GPU delegate with OpenCL, and I can see the LESS operation being correctly detected as unsupported (LESS: Not supported logical op case.). I do need to use the OpenGL delegate instead of the OpenCL delegate, since my target device does not support libOpenCL.so. What would be the solution? Here are the logs that I got with the benchmark tool:

08-12 11:11:34.840 16002 16002 I tflite_BenchmarkModelActivity: Running TensorFlow Lite benchmark with args: --graph=/data/local/tmp/model.tflite --use_gpu=true
08-12 11:11:34.846 16002 16002 I tflite  : Log parameter values verbosely: [0]
08-12 11:11:34.846 16002 16002 I tflite  : Graph: [/data/local/tmp/model.tflite]
08-12 11:11:34.846 16002 16002 I tflite  : Use gpu: [1]
08-12 11:11:34.847 16002 16002 I tflite  : Loaded model /data/local/tmp/model.tflite
08-12 11:11:34.847 16002 16002 I tflite  : Initialized TensorFlow Lite runtime.
08-12 11:11:34.849 16002 16002 I tflite  : Created TensorFlow Lite delegate for GPU.
08-12 11:11:34.849 16002 16002 I tflite  : GPU delegate created.
08-12 11:11:34.854 16002 16002 E tflite  : Following operations are not supported by GPU delegate:
08-12 11:11:34.854 16002 16002 E tflite  : ADD: OP is supported, but tensor type/shape isn't compatible.
08-12 11:11:34.854 16002 16002 E tflite  : ARG_MIN: Operation is not supported.
08-12 11:11:34.854 16002 16002 E tflite  : CAST: Not supported Cast case. Input type: BOOL and output type: INT32
08-12 11:11:34.854 16002 16002 E tflite  : FLOOR_DIV: OP is supported, but tensor type/shape isn't compatible.
08-12 11:11:34.854 16002 16002 E tflite  : GATHER_ND: Operation is not supported.
08-12 11:11:34.854 16002 16002 E tflite  : GREATER: Not supported logical op case
08-12 11:11:34.854 16002 16002 E tflite  : GREATER: Not supported logical op case.
08-12 11:11:34.854 16002 16002 E tflite  : GREATER_EQUAL: Not supported logical op case.
08-12 11:11:34.854 16002 16002 E tflite  : LESS: Not supported logical op case.
08-12 11:11:34.854 16002 16002 E tflite  : MUL: OP is supported, but tensor type/shape isn't compatible.
08-12 11:11:34.854 16002 16002 E tflite  : NOT_EQUAL: Not supported logical op case.
08-12 11:11:34.854 16002 16002 E tflite  : PACK: OP is supported, but tensor type/shape isn't compatible.
08-12 11:11:34.854 16002 16002 E tflite  : RESHAPE: OP is supported, but tensor type/shape isn't compatible.
08-12 11:11:34.854 16002 16002 E tflite  : SELECT: Operation is not supported.
08-12 11:11:34.854 16002 16002 E tflite  : STRIDED_SLICE: STRIDED_SLICE supports for 3 or 4 dimensional tensors only.
08-12 11:11:34.854 16002 16002 E tflite  : STRIDED_SLICE: Slice does not support shrink_axis_mask parameter. 
08-12 11:11:34.854 16002 16002 E tflite  : SUB: OP is supported, but tensor type/shape isn't compatible.
08-12 11:11:34.854 16002 16002 E tflite  : SUM: OP is supported, but tensor type/shape isn't compa
08-12 11:11:34.854 16002 16002 I tflite  : Replacing 120 node(s) with delegate (TfLiteGpuDelegateV2) node, yielding 2 partitions.
08-12 11:11:40.564 16002 16002 I tflite  : Initialized OpenCL-based API.
08-12 11:11:40.980 16002 16002 I tflite  : Created 1 GPU delegate kernels.
08-12 11:11:40.982 16002 16002 I tflite  : Explicitly applied GPU delegate, and the model graph will be partially executed by the delegate w/ 1 delegate kernels.