Closed dailystudio closed 5 years ago
@dailystudio
Developers needn't care about which threads are used for calling these APIs.
Unfortunately, that is not the case when using OpenGL. OpenGL is a state machine, and a proper GL context needs to be kept around. This GL context is bound to the thread that it was created on.
https://www.khronos.org/opengl/wiki/OpenGL_and_multithreading
There are mechanisms to transfer GL contexts or use parent / children GL contexts for multithreaded architectures, but at the current level of developer preview, the exposed APIs probably don't give you enough control to do this. Open sourcing GPU is just around the corner, at which point you will have more fine-grained control of the GPU processing with respect to your multithreaded programming model.
@dailystudio
Developers needn't care about which threads are used for calling these APIs.
Unfortunately, that is not the case when using OpenGL. OpenGL is a state machine, and a proper GL context needs to be kept around. This GL context is bound to the thread that it was created on.
https://www.khronos.org/opengl/wiki/OpenGL_and_multithreading
There are mechanisms to transfer GL contexts or use parent / children GL contexts for multithreaded architectures, but at the current level of developer preview, the exposed APIs probably don't give you enough control to do this. Open sourcing GPU is just around the corner, at which point you will have more fine-grained control of the GPU processing with respect to your multithreaded programming model.
Hmm..., but I am not using OpenGL in my demo application. I understand your points, but from my point of view, as a developer who is using TFLite, I only care about how to use your APIs to achieve my own goals. In my codes, there are no OpenGL related codes. So, let me learn and understand the concept of OpenGL contexts or multi-threads architectures seems like an imposition. I think my case is quite typical. First, you load a model from assets and create an Interpreter for using in the future. Then when you really need it, you call run() to inference the results. Calling these two steps in the main thread is absolutely not acceptable, especially in a real product. That means in most cases, these two steps will be called in different threads and probably be called in two different threads You couldn't suppose every developer who uses TFLite will have acknowledgment about everything. If I am using OpenGL to write the application, yes, I may be aware that there would be some multi-thread issues. But I am just a developer who is writing a standard application which is using TF to segment image. To be honest, I had used an entire afternoon to find out the root cause of this issue. Just because I am quite interested and have enthusiastic in Tensorflow. I am not challenging the work what you have already done. I just want it to be better and can be accepted by more people. My suggestions are:
Thanks for your patient to pay attention to my issue and I hope my advice could help the TFLite become better in the future.
@dailystudio
That's actually pretty good feedback and one of the reason's why we put out a developer "preview" to gather feedback like yours. We really appreciate it.
You can handle the OpenGL context issues in the implementation of TFLite libraries.
The fact that you have to be mindful of the GL context is inevitable, especially when you work in multithreaded settings. We actually tried our best to hide that away from the users (and that's why you don't see the GL context in the API), but maybe hiding that was a bad thing. If the API requires you to provide the GL context (or maybe the thread that owns the GL context), maybe that might have been better.
You can throw a runtime exception to warn the developer that they are using the API incorrectly and they should keep creation and inference in the same thread.
That's a great idea. We'll see how we can add that check without losing performance. Doing a GL context check before every runInference
is probably not the right way to go ;)
But no matter which solution you finally decide to use, just simply adding some tips on the related section of the Tensorflow official website to tell the developers about this.
Will do.
For now, I guess the easiest trick you can employ (if you want to go down the path of multithreading) is to have a dedicated thread that does initialization and inference all there, and you send a signal to the thread to run the inference.
@impdji Great! I am looking forward to these updates. ; )
+1 to adding some sort of warning or runtime exception that @dailystudio already mentioned. I ran into this today so luckily there was already an issue open for it :) When I hit the issue while trying out gpu delegate, my app just froze and after debugging, saw that it was on tflite.run
.
+1 as well, took me quite a while to find out why .run is not returning. anyway guys, keep up the good job, it is really appreciated and it is really cool to watch how TF is evolving. 2 yrs ago something ondevice GPU support was hard to imagine for me.
One note to this that might or might NOT relate to the single thread issue.
On most older devices (however with Open GL 3.2 capable), when inference is run on GPU, preview frame rate tend to drop after few seconds. So even though it runs inference faster than on CPU, it probably blocks Texture View somehow. Is there some general recommendation on which GPUs is makes sense to use GPUDelegate?
@bazinac
We have seen frame rate dropping when the device overheats. Otherwise, we have not experienced performance degradations from other factors. I mostly work on C++ layer, so I don't know whether Java can cause any issues, but given that the MobileNet demo app just runs fine in Java, I'm wondering whether it's device overheating.
re: recommendation. The ideal use case is the following:
GPUDelegate.bindGlBufferToTensor()
to associate that SSBO with the input tensor.There is also similar optimization you can do for output.
Once project is fully open sourced, you should even have access to the command buffer queue, and can directly render the output of the network, if your network's output is something that can be directly rendered on screen. We didn't expose that through API, because it would be too complicated without showing the code what's going on.
Thanks for prompt answer. However I am not refering to situation, that could be caused by overheating. Even when running here provided demo on some older devices (like Samsung Galaxy J5 2017, Galaxy Tab S2), frame rate drops right after few seconds (like 3-5) when you switch to GPU. When using CPU, this does not happen.
Also thanks for recommendation for some input feeding optimalization, will try.
@bazinac
If it's not an overheat issue, we have seen slowdowns sometimes:
However, none of these is really applicable to your situation =/
Is your phone's OpenGL driver up to date?
Anyone who has subscribed to this:
I have just submitted the change with checking whether init's EGLContext is the same as invoke's EGLContext. The phrase "submitted" applies to the internal code for now. I'm not sure when this will go live to the public; we are trying to decide whether we should do another dev preview release, or whether we should just go with open source, as we're pretty close ;)
Not officially announced yet, but FYI: GPU code is now visible at:
https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/delegates/gpu
if you need the code for better insight what is happening.
Now that the code of checking gl context is live, I'm gonna close this issue. Please reopen if things don't work as expected.
@impjdi Could you provide some example on how to work with SSBO in TFLite classification android app..
@SanthoshRajendiran
https://github.com/tensorflow/tensorflow/issues/26297
has some shader code and its invocation around it. The shader code there is mapping GlTexture to SSBO.
But it only works sometimes... :/
@jsolves
From past reports, we know that it hangs when you don't have the right OpenGL context than when it was initialized. Make sure that your interpreter initialization & interpreter invoke (well, run in Java) is happening on the same thread. We had a way of throwing an exception, but that collided with something else, so that we had to revert that change :(
Yes, I know. I was refering to #26297.
System information
Describe the current behavior Using the following code snippet to create an Interpreter with GPU delegate
Calling the run() of the Interpreter with following lines of codes:
If these two code snippets are called in two different threads, the thread which calls interpreter.run() will be blocked. interpreter.run() will never return. If these two code snippets are called in the same thread, interpreter.run() will be executed properly and output correct results.
Describe the expected behavior Developers needn't care about which threads are used for calling these APIs. Even these APIs are called in different threads, interpreter.run() should return correctly with blocking issue.
Code to reproduce the issue The full code can be found here: https://github.com/dailystudio/ml/blob/master/deeplab/app/src/main/java/com/dailystudio/deeplab/ml/DeepLabLite.java Currently, the code in repository works fine because the new Interpreter() and interpreter.run() are called in the same thread. The DeepLabLite class has two important functions: initialize() and segment(). In intialize(), we read TFLite model from asset/ directory into a MappedByteBuffer:
In segment(), we use that MappedByteBuffer to create an Interpreter and call run() for inference:
The DeepLabLite.initialize() is called in an AsyncTask after application is launched, while the DeepLabLite.segment() is called a Loader after users pick an image for segmentation. These codes will be no problem.
But if we keep the codes of calling these two methods unchanged and move the following line from segment() to initialize():
Then the calling of interpreter.run() will be blocked forever.
Other information With my tests, I suspect this problem is independent of devices. It would happen on all Android devices. It should be related to GpuDelegate. If you do not call options.addDelegate() to add a GpuDelegate, the interpreter.run() will also run well.