homuler / MediaPipeUnityPlugin

Unity plugin to run MediaPipe
MIT License
1.79k stars 466 forks source link

Android Selfie segmentation poor performance - 10 fps #670

Closed LoopIssuer closed 2 years ago

LoopIssuer commented 2 years ago

Feature Description

Selfie segmentation sample scene in android build

Current Behaviour/State

Selfie segmentation sample scene in Android build - performance is very low - about 10 fps. Could you please tell me, what is the cause of that? And can it be improved by me somehow, or maybe it depends on the libraries?

Is it maybe in some milestone in the future to make performance better?

Thanks in advance for the answer.

Additional Context

No response

homuler commented 2 years ago

Please test the official sample app first. If the Unity sample performs much worse than it, then let me know.

LoopIssuer commented 2 years ago

Unity sample performs much worse with running mode:

Phone is Samsung A50

homuler commented 2 years ago

when set to async or nonBlocking sync fps is about 30 - but there is green screen flickering (especially in nonBlocking)

When running in the async mode, try setting a larger value to timeoutMillisec. This will usually mitigate the flickering issue.

LoopIssuer commented 2 years ago

Unity sample measure:

 public void Update()
  {
    count -= 1;
    totalTime += Time.deltaTime;

    if (count <= 0)
    {
      float fps = samples / totalTime;
      Text.text=(fps).ToString(); 
      totalTime = 0f;
      count = samples;
    }
  }

Official sample - it is visually much, much better, looks like 30 fps. In the Unity sample when fps is about 10-15 there is visible input lag (about 1 second).

Setting a larger value to timeoutMillisec 300 (less do nothing good) disabled flickering, however, there is the lag when the user moves - there is background not immediately cropped - user silhouette is left for about 1 second.

homuler commented 2 years ago

I don't have time to examine the issue in detail, so I will only write the information that seems necessary/useful.

ROBYER1 commented 2 years ago

What resolution is your phone camera running at? I have a Sony phone with a very high res front and rear camera, if android is using a 4k texture from the camera the phone can struggle, try using a lower camera texture value if so.

LoopIssuer commented 2 years ago

Hi @ROBYER1 640x360 " lower camera texture value " - you mean in options of this sample? Or somewhere in code? In options I have already tried lowest resolutions, but there is only about 2 Fps better.

homuler commented 2 years ago
  • The fastest way to get it working on Android is to have MediaPipe output the image.

Try the following config in the MediaPipe Video scene.

# Copyright 2019 The MediaPipe Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# GPU image. (GpuBuffer)
input_stream: "input_video"

# GPU image. (GpuBuffer)
output_stream: "output_video"

# Throttles the images flowing downstream for flow control. It passes through
# the very first incoming image unaltered, and waits for downstream nodes
# (calculators and subgraphs) in the graph to finish their tasks before it
# passes through another image. All images that come in while waiting are
# dropped, limiting the number of in-flight images in most part of the graph to
# 1. This prevents the downstream nodes from queuing up incoming images and data
# excessively, which leads to increased latency and memory usage, unwanted in
# real-time mobile applications. It also eliminates unnecessarily computation,
# e.g., the output produced by a node may get dropped downstream if the
# subsequent nodes are still busy processing previous inputs.
node {
  calculator: "FlowLimiterCalculator"
  input_stream: "input_video"
  input_stream: "FINISHED:output_video"
  input_stream_info: {
    tag_index: "FINISHED"
    back_edge: true
  }
  output_stream: "throttled_input_video"
}

node: {
  calculator: "ImageTransformationCalculator"
  input_stream: "IMAGE_GPU:throttled_input_video"
  input_side_packet: "ROTATION_DEGREES:input_rotation"
  input_side_packet: "FLIP_HORIZONTALLY:input_horizontally_flipped"
  input_side_packet: "FLIP_VERTICALLY:input_vertically_flipped"
  output_stream: "IMAGE_GPU:transformed_input_video"
}

# Subgraph that performs selfie segmentation.
node {
  calculator: "SelfieSegmentationGpu"
  input_stream: "IMAGE:transformed_input_video"
  output_stream: "SEGMENTATION_MASK:segmentation_mask"
}

# Colors the selfie segmentation with the color specified in the option.
node {
  calculator: "RecolorCalculator"
  input_stream: "IMAGE_GPU:transformed_input_video"
  input_stream: "MASK_GPU:segmentation_mask"
  output_stream: "IMAGE_GPU:output_video_raw"
  node_options: {
    [type.googleapis.com/mediapipe.RecolorCalculatorOptions] {
      color { r: 0 g: 0 b: 255 }
      mask_channel: RED
      invert_mask: true
      adjust_with_luminance: false
    }
  }
}

# Flip vertically because the output image is aligned from top-left to bottom-right.
node: {
  calculator: "GlScalerCalculator"
  input_stream: "VIDEO:output_video_raw"
  input_side_packet: "ROTATION:output_rotation"
  output_stream: "VIDEO:output_video"
  node_options: {
    [type.googleapis.com/mediapipe.GlScalerCalculatorOptions] {
      flip_vertical: true
    }
  }
}

While this config could be further improved, it is close to the best if you do not write another Native Plugin. If this does not work as fast as you want it to, you should look for another way.

Note that if the sample app runs at 30 FPS on your phone and if the inference + rendering takes more than 33ms (< 66ms), then FPS will drop to 15 since the overall process is consuming 2 frames (cf. https://github.com/homuler/MediaPipeUnityPlugin/issues/428).

LoopIssuer commented 2 years ago

Thanks for advice @homuler
Could you please explain to me this one below - should I add SetupOutputPacket method to SelfieSegmentationGraph?

The fastest way to get it working on Android is to have MediaPipe output the image. In most cases, unlike getting landmarks in other solutions, I think there is not much need to receive the segmentation mask in Unity. See the MediaPipe Video scene sample for more details.

homuler commented 2 years ago

Please replace the contents of official_hand_tracking_demo_opengles.txt with the config I shared and run the MediaPipe Video scene on your phone. The Scene Segmentation solution will run in the MediaPipe Video scene (you may need to run the Selfie Segmentation scene first to store dependent assets on your device).

LoopIssuer commented 2 years ago

Thanks, it works now with 26 fps. However, now background recognition is worse - mostly user is shown only to the neck - below it is cropped. So, the last question is - are there some properties in official_hand_tracking_demo_opengles.txt to modify to get something in the middle of the Selfie sample and MediaPipe Video sample (with modification)?

And how in the Video sample change background to other Render Texture?

homuler commented 2 years ago

However, now background recognition is worse - mostly user is shown only to the neck - below it is cropped. So, the last question is - are there some properties in official_hand_tracking_demo_opengles.txt to modify to get something in the middle of the Selfie sample and MediaPipe Video sample (with modification)?

Can you share an image to compare the result? I'm not sure, but the calculator used is exactly the same (SelfieSegmentationGpu), so I believe the accuracy is the same, too.

And how in the Video sample change background to other Render Texture?

Sorry, I don't understand exactly what you are asking, can you please correct grammatical errors? In the MediaPipe Video scene, MediaPipe (C++) renders the result directly to the texture of the screen. The pointer to the texture is passed to MediaPipe as follows: https://github.com/homuler/MediaPipeUnityPlugin/blob/adb2d908cb6e7b82950c8bfa51c493760a6293b8/Assets/MediaPipeUnity/Samples/Scenes/MediaPipe%20Video/MediaPipeVideoSolution.cs#L25-L31

LoopIssuer commented 2 years ago

Hi @homuler

And how in the Video sample change background to other Render Texture?

Sorry for not being clear enough. I would like to know if there is a simple way to implement MaskAnnotation for the Media Pipe Video Sample. I need to change the segmented background color and instead this color show another texture (Unity's Render Texture) behind the user. It was easy in the Selfie Segmentation sample with MaskAnnotation (but in Video Sample Unmask shader doesn't work).

Can you share an image to compare the result?

It seems that sometimes there is weird behavior of the Android camera rotation (or maybe webcamera texture) - it rotates opposite to how I'm handling the device. In that case, the cropping background is worse, i.e. in the attached picture, there is a missing part of the body below the neck. But it is probably connected to the webcamera bug on Android. I think I will handle this. When the camera is rotated correctly, cropping is good. image

Thank you very much for all your help and effort.

homuler commented 2 years ago

I would like to know if there is a simple way to implement MaskAnnotation for the Media Pipe Video Sample.

The MediaPipe Video scene simply renders the output returned by MediaPipe, so you need to write your own Calculator to create an output image if the existing Calculators don't support your use case.

Will you tell me which line takes a long time to finish in the Selfie Segmentation scene? I'm thinking that if you remove one yield sentence, you can run it at 30 FPS.

diff --git a/Assets/MediaPipeUnity/Samples/Common/Scripts/ImageSourceSolution.cs b/Assets/MediaPipeUnity/Samples/Common/Scripts/ImageSourceSolution.cs
index 25b9028..b539864 100644
--- a/Assets/MediaPipeUnity/Samples/Common/Scripts/ImageSourceSolution.cs
+++ b/Assets/MediaPipeUnity/Samples/Common/Scripts/ImageSourceSolution.cs
@@ -7,6 +7,8 @@
 using System.Collections;
 using UnityEngine;

+using Stopwatch = System.Diagnostics.Stopwatch;
+
 namespace Mediapipe.Unity
 {
   public abstract class ImageSourceSolution<T> : Solution where T : GraphRunner
@@ -15,6 +17,7 @@ namespace Mediapipe.Unity
     [SerializeField] protected T graphRunner;
     [SerializeField] protected TextureFramePool textureFramePool;

+    protected Stopwatch stopwatch;
     private Coroutine _coroutine;

     public RunningMode runningMode;
@@ -33,6 +36,7 @@ namespace Mediapipe.Unity
       }
       base.Play();
       _coroutine = StartCoroutine(Run());
+      stopwatch = new Stopwatch();
     }

     public override void Pause()
@@ -84,9 +88,11 @@ namespace Mediapipe.Unity
       graphRunner.StartRun(imageSource);

       var waitWhilePausing = new WaitWhile(() => isPaused);
+      stopwatch.Start();

       while (true)
       {
+        stopwatch.Restart();
         if (isPaused)
         {
           yield return waitWhilePausing;
@@ -95,19 +101,27 @@ namespace Mediapipe.Unity
         if (!textureFramePool.TryGetTextureFrame(out var textureFrame))
         {
           yield return new WaitForEndOfFrame();
+          UnityEngine.Debug.Log($"Failed to get a TextureFrame: {stopwatch.ElapsedMilliseconds}");
           continue;
         }

+        UnityEngine.Debug.Log($"Got a TextureFrame: {stopwatch.ElapsedMilliseconds}");
+
         // Copy current image to TextureFrame
         ReadFromImageSource(imageSource, textureFrame);
+        UnityEngine.Debug.Log($"Copied the next input: {stopwatch.ElapsedMilliseconds}");
         AddTextureFrameToInputStream(textureFrame);
+        UnityEngine.Debug.Log($"Sent the next input: {stopwatch.ElapsedMilliseconds}");
         yield return new WaitForEndOfFrame();

         if (runningMode.IsSynchronous())
         {
+          UnityEngine.Debug.Log($"Rendering the current frame: {stopwatch.ElapsedMilliseconds}");
           RenderCurrentFrame(textureFrame);
+          UnityEngine.Debug.Log($"Rendered the current frame: {stopwatch.ElapsedMilliseconds}");
           yield return WaitForNextValue();
         }
+        UnityEngine.Debug.Log($"loop end: {stopwatch.ElapsedMilliseconds}");
       }
     }

diff --git a/Assets/MediaPipeUnity/Samples/Scenes/Selfie Segmentation/SelfieSegmentationSolution.cs b/Assets/MediaPipeUnity/Samples/Scenes/Selfie Segmentation/SelfieSegmentationSolution.cs
index e6f8d90..a99c1a1 100644
--- a/Assets/MediaPipeUnity/Samples/Scenes/Selfie Segmentation/SelfieSegmentationSolution.cs    
+++ b/Assets/MediaPipeUnity/Samples/Scenes/Selfie Segmentation/SelfieSegmentationSolution.cs    
@@ -37,6 +37,8 @@ namespace Mediapipe.Unity.SelfieSegmentation
       if (runningMode == RunningMode.Sync)
       {
         var _ = graphRunner.TryGetNext(out segmentationMask, true);
+        UnityEngine.Debug.Log($"Got the next output: {stopwatch.ElapsedMilliseconds}");
+
       }
       else if (runningMode == RunningMode.NonBlockingSync)
       {
@@ -44,6 +46,7 @@ namespace Mediapipe.Unity.SelfieSegmentation
       }

       _segmentationMaskAnnotationController.DrawNow(segmentationMask);
+      UnityEngine.Debug.Log($"Rendered the annotation: {stopwatch.ElapsedMilliseconds}");
     }

     private void OnSegmentationMaskOutput(object stream, OutputEventArgs<ImageFrame> eventArgs)
LoopIssuer commented 2 years ago

Hi @homuler , Editor log and Android logcat from Selfie example are attached. Hope it helps you in helping me :)

MediaPipePlugin_AndroidLogcat.txt

MediaPipePlugin_Editor.log

LoopIssuer commented 2 years ago

Hi @homuler. Any chance to solve this performance issue? :)

homuler commented 2 years ago

At least, it's impossible to achieve 30FPS on your device in the Sync mode since the inference step (SelfieSegmentationGraph#TryGetNext) takes about 30~40ms and is consuming an entire frame (that is, each loop consumes at least 2 frames).

I would think of a way to delay the camera image while running the graph asynchronously, but all will depend on your requirements.

LoopIssuer commented 2 years ago

Hi @homuler Is there a simple way to make a transparent background in Video Sample, with your modification (https://github.com/homuler/MediaPipeUnityPlugin/issues/670#issuecomment-1195299615)?

Currently, it is blue, but I would like to have only a user image from web camera, without any background.

Thanks in advance for answer and all your help.