Closed mengran1234 closed 1 year ago
HI @chenliang110 !
Currently Lite gpu delegate are limited to leverage gpu capability of android/IOS/Edge TPU's and not extended to Personal computers yet. To use in local machines like AMD GPU, Please check with XNNPack delegate/ default CPU delegates.
Thank you!
HI @chenliang110 !
Currently Lite gpu delegate are limited to leverage gpu capability of android/IOS/Edge TPU's and not extended to Personal computers yet. To use in local machines like AMD GPU, Please check with XNNPack delegate/ default CPU delegates.
Thank you! hello,I know this . But can you help to solve the problem? because I try it in intel notebook, the result is right. thank you very very much
for example , https://github.com/tensorflow/tensorflow/pull/54173 in this , I think tflite can run gpu delegate in notebook
Sure @mengran1234 ! You can build the GPU delegate and load it using interpreter.options too.
import tensorflow as tf
try:
delegate = tf.lite.experimental.load_delegate('delegate.so')
except ValueError:
// Fallback to CPU
if delegate:
interpreter = tf.lite.Interpreter(
model_path='model.tflite',
experimental_delegates=[delegate])
else:
interpreter = tf.lite.Interpreter(model_path='model.tflite')
Can you share the gist or script used in Intel/AMD notebook to replicate the issue.
Thank you!
I have not tried this method yet. I run gpu deleagate using cmake compile method ,and run a small demo (c++). Then compare gpu result and cpu result.
Hi @mengran1234 !
Thanks for the update. Could you confirm you have used below flag while building Tflite-runtime using Cmake.
-DTFLITE_ENABLE_GPU=ON
Could you provide that gist containing your CMake commands and demo example. Thank you!
cmake tensorflow/lite ^ -G "Visual Studio 16 2019" -A x64 ^ -DCMAKE_BUILD_TYPE=Release ^ -DTFLITE_C_BUILD_SHARED_LIBS=OFF ^ -DTFLITE_ENABLE_NNAPI=OFF ^ -DTFLITE_ENABLE_GPU=ON cmake --build . --target demo --config Release demo is so easy, here I can not provide
demo for example TfLiteGpuDelegateOptionsV2 gpu_options = TfLiteGpuDelegateOptionsV2Default(); gpu_options.inference_priority1 = TFLITE_GPU_INFERENCE_PRIORITY_MIN_MEMORY_USAGE; gpu_options.inference_priority2 = TFLITE_GPU_INFERENCE_PRIORITY_MIN_LATENCY; gpu_options.inference_priority3 = TFLITE_GPU_INFERENCE_PRIORITY_MAX_PRECISION; gpu_options.inference_preference = TFLITE_GPU_INFERENCE_PREFERENCE_FAST_SINGLE_ANSWER;
TfLiteGpuDelegateV2Create(&gpuoptions); TfLiteInterpreterModifyGraphWithDelegate(interpreter, gpudelegate); TfLiteInterpreterAllocateTensors TfLiteInterpreterInvoke(interpreter_); TfLiteInterpreterGetOutputTensor
Ok @mengran1234 ! Thanks for the update on reproducible code snippet and commands.
@sachinprasadhs ! Could you look at this issue.
Thank you!
@sachinprasadhs !Could you look at this issue.
Thank you!
I don't have an AMD GPU to test this out. It could be OpenCL driver bug, or our shaders may be relying on undefined behavior of mobile GPUs that don't translate to your AMD GPU. The only thing I would try out is using MAX PRECISION as top inference priority to make sure it's not the FP16 getting in the way. Regardless of the outcome, this is beyond our level of support.
I don't have an AMD GPU to test this out. It could be OpenCL driver bug, or our shaders may be relying on undefined behavior of mobile GPUs that don't translate to your AMD GPU. The only thing I would try out is using MAX PRECISION as top inference priority to make sure it's not the FP16 getting in the way. Regardless of the outcome, this is beyond our level of support. Hello,any AMD notebook can be tested. I think it is an extensive problem (fp16) . GPU FP16 compute is faster than FP32. And I upgrade newest driver for my AMD notebook,but result is also not right
Additionally, can you give a method to find the reason.? for example, is there method to save output result of every net operator. thank you very much
I am sorry I've been replying to you for so long。I was busy with other things a while ago。
You can probably not easily dump out the intermediate tensors because the buffers are reused.
Maybe the easiest hack you can employ is to add a small number, e.g. 1e-6, to the output of an op, and declare that as a part of the graph output.
On Tue, Dec 27, 2022 at 10:32 PM mengran123 @.***> wrote:
I am sorry I've been replying to you for so long。I was busy with other things a while ago。
— Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/issues/58857#issuecomment-1366405100, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACKKUT67IPB4Z72R6MQKLVTWPPNF7ANCNFSM6AAAAAAS3ZPCOY . You are receiving this because you modified the open/close state.Message ID: @.***>
Click to expand!
### Issue Type Bug ### Source source ### Tensorflow Version 2.10 or 2.11 ### Custom Code Yes ### OS Platform and Distribution win64 ### Mobile device AMD ### Python version 3.7 ### Bazel version no ### GCC/Compiler version no ### CUDA/cuDNN version no ### GPU model and memory 111 ### Current Behaviour? ```shell A bug happened! device : AMD Ryzen 5 5600U with Radeon Graphics (notebook) I run a .tflite model on notebook PC using GPU delegate (opencl backend), the inference result is wrong . I try other tflite models , or other AMD notebook,both find result is wrong on GPU delagate. Please you help see it ,thank you very much ``` ### Standalone code to reproduce the issue ```shell my configuration is following, TfLiteGpuDelegateOptionsV2 gpu_options = TfLiteGpuDelegateOptionsV2Default(); gpu_options.inference_priority1 = TFLITE_GPU_INFERENCE_PRIORITY_MIN_MEMORY_USAGE; gpu_options.inference_priority2 = TFLITE_GPU_INFERENCE_PRIORITY_MIN_LATENCY; gpu_options.inference_priority3 = TFLITE_GPU_INFERENCE_PRIORITY_MAX_PRECISION; gpu_options.experimental_flags |= TFLITE_GPU_EXPERIMENTAL_FLAGS_ENABLE_QUANT; But IF I use following configuration, gpu_options.inference_priority1 = TFLITE_GPU_INFERENCE_PRIORITY_MAX_PRECISION; gpu_options.inference_priority2 =TFLITE_GPU_INFERENCE_PRIORITY_MIN_MEMORY_USAGE; gpu_options.inference_priority3 = TFLITE_GPU_INFERENCE_PRIORITY_MIN_LATENCY; the result is right. Is it bug ? ``` ### Relevant log output _No response_