Open farazk86 opened 3 years ago
Hi @farazk86,
I know that this message (and repo) is rather old but I'm testing this demo and struggle to find a way to make it working with GPU delegate. Would you mind sharing what you did ?
My understanding is that the model is not adapted to run on GPU but I can't even start the app without crash, so I'm curious to know how you did it. Without that modification below, the app runs perfectly and outputs about 1 word/sec.
If anyone else has insights about that, I would be really grateful as well (@Pierrci ? @sayakpaul ?). Sorry if that's a very noob question !
I had some difficulties related to gradle / TF version but now I can build a valid APK supporting GPU with the following modifs :
GPT2Client.kt
import org.tensorflow.lite.gpu.CompatibilityList
import org.tensorflow.lite.gpu.GpuDelegate
.......
//val opts = Interpreter.Options()
//opts.setNumThreads(NUM_LITE_THREADS)
val compatList = CompatibilityList()
val opts = Interpreter.Options().apply{
if(compatList.isDelegateSupportedOnThisDevice){
// if the device has a supported GPU, add the GPU delegate
val delegateOptions = compatList.bestOptionsForThisDevice
this.addDelegate(GpuDelegate(delegateOptions))
} else {
// if the GPU is not supported, run on 4 threads
this.setNumThreads(NUM_LITE_THREADS)
}
}
and of course adding in build.gradle
implementation 'org.tensorflow:tensorflow-lite:2.5.0'
implementation 'org.tensorflow:tensorflow-lite-gpu:2.5.0'
But when I run the app it crashes on startup with the following error (tflite 2.3)
12-12 10:55:56.204 3214 3214 D Launcher: onStop
12-12 10:55:56.241 15950 15956 I zygote64: Do partial code cache collection, code=59KB, data=38KB
12-12 10:55:56.241 15950 15956 I zygote64: After code cache collection, code=57KB, data=37KB
12-12 10:55:56.241 15950 15956 I zygote64: Increasing code cache capacity to 256KB
12-12 10:55:56.461 15950 15969 D libGLESv3: Successfully load libGLESv2_oneplus.so, this=0x7581a5c008
12-12 10:55:56.463 15950 15969 I tflite : Created TensorFlow Lite delegate for GPU.
12-12 10:55:56.466 15950 15969 I tflite : Initialized TensorFlow Lite runtime.
12-12 10:55:56.477 15950 15969 I tflite : Created 0 GPU delegate kernels.
12-12 10:16:41.414 8335 8335 E AndroidRuntime: FATAL EXCEPTION: main
12-12 10:16:41.414 8335 8335 E AndroidRuntime: Process: co.huggingface.android_transformers.gpt2, PID: 8335
12-12 10:16:41.414 8335 8335 E AndroidRuntime: java.lang.IllegalArgumentException: ByteBuffer is not a valid flatbuffer model
12-12 10:16:41.414 8335 8335 E AndroidRuntime: at org.tensorflow.lite.NativeInterpreterWrapper.createModelWithBuffer(Native Method)
12-12 10:16:41.414 8335 8335 E AndroidRuntime: at org.tensorflow.lite.NativeInterpreterWrapper.<init>(NativeInterpreterWrapper.java:60)
12-12 10:16:41.414 8335 8335 E AndroidRuntime: at org.tensorflow.lite.Interpreter.<init>(Interpreter.java:224)
12-12 10:16:41.414 8335 8335 E AndroidRuntime: at co.huggingface.android_transformers.gpt2.ml.GPT2Client$loadModel$2.invokeSuspend(GPT2Client.kt:137)
12-12 10:16:41.414 8335 8335 E AndroidRuntime: at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
12-12 10:16:41.414 8335 8335 E AndroidRuntime: at kotlinx.coroutines.DispatchedTask.run(Dispatched.kt:241)
12-12 10:16:41.414 8335 8335 E AndroidRuntime: at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:594)
12-12 10:16:41.414 8335 8335 E AndroidRuntime: at kotlinx.coroutines.scheduling.CoroutineScheduler.access$runSafely(CoroutineScheduler.kt:60)
12-12 10:16:41.414 8335 8335 E AndroidRuntime: at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:740)
12-12 10:55:56.488 15950 15950 E AndroidRuntime: FATAL EXCEPTION: main
12-12 10:55:56.488 15950 15950 E AndroidRuntime: Process: co.huggingface.android_transformers.gpt2, PID: 15950
12-12 10:55:56.488 15950 15950 E AndroidRuntime: java.lang.IllegalArgumentException: Internal error: Failed to apply delegate: Following operations are not supported by GPU delegate:
12-12 10:55:56.488 15950 15950 E AndroidRuntime: DEQUANTIZE:
12-12 10:55:56.488 15950 15950 E AndroidRuntime: DIV: Op can only handle 1 or 2 operand(s).
12-12 10:55:56.488 15950 15950 E AndroidRuntime: GATHER: Operation is not supported.
12-12 10:55:56.488 15950 15950 E AndroidRuntime: PACK: Operation is not supported.
12-12 10:55:56.488 15950 15950 E AndroidRuntime: POW: Op can only handle 1 or 2 operand(s).
12-12 10:55:56.488 15950 15950 E AndroidRuntime: SPLIT: Operation is not supported.
12-12 10:55:56.488 15950 15950 E AndroidRuntime: 106 operations will run on the GPU, and the remaining 2317 operations will run on the CPU.
12-12 10:55:56.488 15950 15950 E AndroidRuntime: TfLiteGpuDelegate Init: SLICE: Output batch don't match
12-12 10:55:56.488 15950 15950 E AndroidRuntime: TfLiteGpuDelegate Prepare: delegate is not initialized
12-12 10:55:56.488 15950 15950 E AndroidRuntime: Node nu
12-12 10:55:56.488 15950 15950 E AndroidRuntime: at org.tensorflow.lite.NativeInterpreterWrapper.applyDelegate(Native Method)
12-12 10:55:56.488 15950 15950 E AndroidRuntime: at org.tensorflow.lite.NativeInterpreterWrapper.applyDelegates(NativeInterpreterWrapper.java:351)
12-12 10:55:56.488 15950 15950 E AndroidRuntime: at org.tensorflow.lite.NativeInterpreterWrapper.init(NativeInterpreterWrapper.java:82)
12-12 10:55:56.488 15950 15950 E AndroidRuntime: at org.tensorflow.lite.NativeInterpreterWrapper.<init>(NativeInterpreterWrapper.java:63)
12-12 10:55:56.488 15950 15950 E AndroidRuntime: at org.tensorflow.lite.Interpreter.<init>(Interpreter.java:266)
12-12 10:55:56.488 15950 15950 E AndroidRuntime: at co.huggingface.android_transformers.gpt2.ml.GPT2Client$loadModel$2.invokeSuspend(GPT2Client.kt:155)
12-12 10:55:56.488 15950 15950 E AndroidRuntime: at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
12-12 10:55:56.488 15950 15950 E AndroidRuntime: at kotlinx.coroutines.DispatchedTask.run(Dispatched.kt:241)
12-12 10:55:56.488 15950 15950 E AndroidRuntime: at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:594)
12-12 10:55:56.488 15950 15950 E AndroidRuntime: at kotlinx.coroutines.scheduling.CoroutineScheduler.access$runSafely(CoroutineScheduler.kt:60)
12-12 10:55:56.488 15950 15950 E AndroidRuntime: at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:740)
12-12 10:55:56.494 15950 15979 D OSTracker: OS Event: crash
12-12 10:55:56.496 1222 2239 W ActivityManager: Force finishing activity co.huggingface.android_transformers.gpt2/.MainActivity
12-12 10:55:56.498 1222 1748 I ActivityManager: Showing crash dialog for package co.huggingface.android_transformers.gpt2 u0
12-12 10:55:56.502 1222 1747 D RestartProcessManager: Duration is too short, ignore : 696 in co.huggingface.android_transformers.gpt2
With tflite 2.4 it's a bit different :
12-12 11:08:18.914 17407 17426 I tflite : Created TensorFlow Lite delegate for GPU.
12-12 11:08:18.917 17407 17426 I tflite : Initialized TensorFlow Lite runtime.
12-12 11:08:18.928 17407 17426 I tflite : Created 0 GPU delegate kernels.
12-12 11:08:18.959 17407 17407 E AndroidRuntime: FATAL EXCEPTION: main
12-12 11:08:18.959 17407 17407 E AndroidRuntime: Process: co.huggingface.android_transformers.gpt2, PID: 17407
12-12 11:08:18.959 17407 17407 E AndroidRuntime: java.lang.IllegalArgumentException: Internal error: Failed to apply delegate: Following operations are not supported by GPU delegate:
12-12 11:08:18.959 17407 17407 E AndroidRuntime: DEQUANTIZE:
12-12 11:08:18.959 17407 17407 E AndroidRuntime: GATHER: Operation is not supported.
12-12 11:08:18.959 17407 17407 E AndroidRuntime: MEAN: Mean operation supports only HW plane
12-12 11:08:18.959 17407 17407 E AndroidRuntime: SPLIT: Operation is not supported.
12-12 11:08:18.959 17407 17407 E AndroidRuntime: 147 operations will run on the GPU, and the remaining 2276 operations will run on the CPU.
12-12 11:08:18.959 17407 17407 E AndroidRuntime: TfLiteGpuDelegate Init: Tensor "Identity_8" has bad input dims size: 5.
12-12 11:08:18.959 17407 17407 E AndroidRuntime: TfLiteGpuDelegate Prepare: delegate is not initialized
12-12 11:08:18.959 17407 17407 E AndroidRuntime: Node number 2423 (TfLiteGpuDelegateV2) failed to prepare.
12-12 11:08:18.959 17407 17407 E AndroidRuntime:
12-12 11:08:18.959 17407 17407 E AndroidRuntime: Restored
12-12 11:08:18.959 17407 17407 E AndroidRuntime: at org.tensorflow.lite.NativeInterpreterWrapper.applyDelegate(Native Method)
12-12 11:08:18.959 17407 17407 E AndroidRuntime: at org.tensorflow.lite.NativeInterpreterWrapper.applyDelegates(NativeInterpreterWrapper.java:367)
12-12 11:08:18.959 17407 17407 E AndroidRuntime: at org.tensorflow.lite.NativeInterpreterWrapper.init(NativeInterpreterWrapper.java:85)
12-12 11:08:18.959 17407 17407 E AndroidRuntime: at org.tensorflow.lite.NativeInterpreterWrapper.<init>(NativeInterpreterWrapper.java:63)
12-12 11:08:18.959 17407 17407 E AndroidRuntime: at org.tensorflow.lite.Interpreter.<init>(Interpreter.java:277)
12-12 11:08:18.959 17407 17407 E AndroidRuntime: at co.huggingface.android_transformers.gpt2.ml.GPT2Client$loadModel$2.invokeSuspend(GPT2Client.kt:155)
12-12 11:08:18.959 17407 17407 E AndroidRuntime: at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
12-12 11:08:18.959 17407 17407 E AndroidRuntime: at kotlinx.coroutines.DispatchedTask.run(Dispatched.kt:241)
12-12 11:08:18.959 17407 17407 E AndroidRuntime: at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:594)
12-12 11:08:18.959 17407 17407 E AndroidRuntime: at kotlinx.coroutines.scheduling.CoroutineScheduler.access$runSafely(CoroutineScheduler.kt:60)
12-12 11:08:18.959 17407 17407 E AndroidRuntime: at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:740)
12-12 11:08:18.966 17407 17436 D OSTracker: OS Event: crash
12-12 11:08:18.967 1222 3147 W ActivityManager: Force finishing activity co.huggingface.android_transformers.gpt2/.MainActivity
Hi,
I cannot achieve the speed demonstrated in the gif: https://github.com/huggingface/tflite-android-transformers/tree/master/gpt2
It takes about 7 seconds to generate a single word on my build. I am even using gpuDelegate to run interpreter on GPU rather than CPU and its still slower.
Has the gif been sped up? am I the only one having this poor performance?
Thanks