Profiled Latency vs Real FPS

varunatohilo commented 2 months ago

Plugin Version or Commit ID

v0.14.3

Unity Version

2022.3.33f1

Your Host OS

Windows 11

Target Platform

Android

Description

I am working with the Pose Detection Sample and the performance in the Unity Editor is nice (130FPS on my laptop: RTX3050 + Ryzen 5600H). The sample works not well on lower-end Android phones, the one I primarily run tests on is a Samsung M30s using the Sync method on the Lite model with Only smooth landmarks on.

Below is the deep profiled data:

After poking around for a few days, I noticed that the methods WebCamTexture.GetPixels32() and MoveNext() are only being called once every 4 frames (sometimes 5) creating a Comb like shape whenever they are being called, why is that? And why do they take so long?

Due to these methods, vsync makes the fps capped at 30. I want to ask you 2 questions, the first being that even tho the least FPS I can get is 30 due to vsync, when I'm playing the game on the phone, it feels very laggy (like 13-14 FPS) what could be the reason to this?

and second, is there any way to make this work better?

Code to Reproduce the issue

No response

Additional Context

No response

homuler commented 2 months ago

it feels very laggy (like 13-14 FPS) what could be the reason to this?

I would like you to ask the question in a more technically precise manner. If it's just about possibilities, there could be countless causes. For example, if inference itself takes around 60ms, it will naturally only run at about 15fps.

and second, is there any way to make this work better?

It depends on the cause.

varunatohilo commented 2 months ago

I tried to study the code and understand this problem on a deeper technical level and found out that the reason why the method is being called periodically (every 4-5 frames) is due to the calculator "FlowLimiterCalculator" used in this node:

node { calculator: "FlowLimiterCalculator" input_stream: "input_video" input_stream: "FINISHED:roi_from_landmarks" input_stream_info: { tag_index: "FINISHED" back_edge: true } output_stream: "throttled_input_video" }

When I remove the node, no frame is dropped and the method is called with each frame. The reason why I get (4-5) frames of delay could be because the model is taking that long to infer the image. Please Let me know if I've got anything wrong.

It depends on the cause

I believe that the cause is the device's lack of resources. The only room for improvement I see is making the WebCamTexture.GetPixels32() function faster somehow.

Other things I tried to make the model infer faster: I only have the use of Pose landmarks and Pose world Landmarks so I removed all the other output streams (segmentation mask, ROI and pose detection). It didn't really make any difference and the model ran at the same speed still.

Please nudge me in a direction to make this model infer any faster. Sorry if this information isn't enough.

homuler commented 2 months ago

Please nudge me in a direction to make this model infer any faster.

Although it doesn't inherently make things faster, I recommend using the Pose Landmarker Task API instead of directly using CalculatorGraph (the sample scenes will also be replaced with ones that use the Task API eventually). Additionally, at least for low-spec devices, it is better to use the lite model and lower the resolution of the input images. Please see the Pose Landmark Detection sample scene to know how to use the API.

The only room for improvement I see is making the WebCamTexture.GetPixels32() function faster somehow.

In the above sample scene, it is implemented without using WebCamTexture.GetPixels32. https://github.com/homuler/MediaPipeUnityPlugin/blob/c2d594e0639c3d0e9bafb58f9334928db7adc5ee/Assets/MediaPipeUnity/Samples/Scenes/Tasks/Pose%20Landmark%20Detection/PoseLandmarkerRunner.cs#L87-L89

homuler / MediaPipeUnityPlugin