homuler / MediaPipeUnityPlugin

Unity plugin to run MediaPipe
MIT License
1.84k stars 468 forks source link

Usage of GpuBuffer with WebCamTexture / support for iOS Metal API #768

Open Paxios opened 2 years ago

Paxios commented 2 years ago

Plugin Version or Commit ID

v0.10.1

Unity Version

2022.2

Your Host OS

macOS Monterey 12.6

Target Platform

Android, iOS

Description

1. Is it possible to use GpuBuffer with WebCamTexture?

I tried creating GpuBuffer with Texture.GetNativeTexturePtr, It compiled fine and the app never crashed. TryGetNext is never true, feels like the result is never provided. There is also no exception thrown, I had Logcat is attached to the device with mediapipe dbg build. I used https://github.com/homuler/MediaPipeUnityPlugin/wiki/API-Overview#gpubuffer , https://github.com/homuler/MediaPipeUnityPlugin/blob/92cad4bbe9ba52514034c52ac5b5f0a99accab06/Assets/Mediapipe/Samples/Scripts/DemoGraph.cs#L99-L112 and https://github.com/homuler/MediaPipeUnityPlugin/blob/18e00a85ac271b123178e593184184e8715ed22e/Assets/MediaPipe/Examples/Scripts/DemoGraph.cs#L54-L63 https://github.com/homuler/MediaPipeUnityPlugin/issues/13 as a reference.

I saw that there are different texture formats (SRGBA from WebCamTexture and BGRA for GpuBuffer), could this be the problem?https://github.com/homuler/MediaPipeUnityPlugin/blob/6b8c6743f23539f7604e74dc260b01e0f58f1707/Assets/MediaPipeUnity/Samples/Common/Scripts/ImageSourceSolution.cs#L72

My goal is to skip copying data from GPU to CPU and then passing it back to GPU (MP). It is a huge bottleneck on older devices. I also tried reading the texture of WebCamTexture from a thread, but it can only be read from the main thread. Calculator I used: https://github.com/homuler/MediaPipeUnityPlugin/blob/6b8c6743f23539f7604e74dc260b01e0f58f1707/Assets/MediaPipeUnity/Samples/Scenes/Pose%20Tracking/pose_tracking_opengles.txt

2. Support for Metal API

I looked around the repo, but I couldn't figure it out, if there is a support for Metal API? If not, is it planned to be added in the future?

Code to Reproduce the issue

No response

Additional Context

No response

homuler commented 2 years ago

Please don't leave the Code to Reproduce the issue field blank.

My goal is to skip copying data from GPU to CPU and then passing it back to GPU (MP). It is a huge bottleneck on older devices.

DemoGraph is a very old implementation. See https://github.com/homuler/MediaPipeUnityPlugin/issues/435#issuecomment-1022752459 instead.

I looked around the repo, but I couldn't figure it out, if there is a support for Metal API? If not, is it planned to be added in the future?

What kind of support do you expect? At least, you can use it as the graphics API.

Paxios commented 2 years ago

Sorry for a late response, I was trying to implement your comment.

GpuBuffer

In the Estimator class, I added 2 logs, so it's visible which part of the code still gets executed and which does not. There is no exception thrown or anything. I never receive a result from the graph. Is there something I'm doing wrong?

Code

Below is the relevant code: CameraManager:

var selectedWebCam = WebCamTexture.devices[0];
WebCamTexture = new WebCamTexture(selectedWebCam.name, requestedHeight: 160, requestedWidth: 160);
WebCamTexture.Play();

EstimationManager (MonoBehaviour):

private void Update(){
Estimator.MakeEstimation(CameraManager.WebCamTexture.width, CameraManager.WebCamTexture.height, CameraManager.WebCamTexture);
}

Estimator:

public void MakeEstimation(int width, int height, WebCamTexture texture){
       if (texture == null || width < 100)
                return;

        TextureFramePool.ResizeTexture(width, height, TextureFormat.RGBA32);
        if (!TextureFramePool.TryGetTextureFrame(out var textureFrame)) 
                return;

        textureFrame.ReadTextureFromOnGPU(texture);
        //I also tried with TextureFrame#ReadTxtureFromOnCPU(texture)
        var gpuBuffer = textureFrame.BuildGpuBuffer(GpuManager.GlCalculatorHelper.GetGlContext());
        Debug.Log("This is logged");
        Graph.AddPacketToInputStream("input_video", new GpuBufferPacket(gpuBuffer, new Timestamp(currentMicroSeconds))).AssertOk();
        if (_outputLandmarksStream.TryGetNext(out var landmarkList)) {
                    Debug.Log("This is NOT logged");
                    [...]
        }
}

Graph that I use:

input_stream: "input_video"
output_stream: "pose_landmarks"
output_stream: "pose_world_landmarks"
node {
  calculator: "FlowLimiterCalculator"
  input_stream: "input_video"
  input_stream: "FINISHED:pose_landmarks"
  input_stream_info: {
    tag_index: "FINISHED"
    back_edge: true
  }
  output_stream: "throttled_input_video"
}
node: {
  calculator: "ImageTransformationCalculator"
  input_stream: "IMAGE_GPU:throttled_input_video"
  input_side_packet: "ROTATION_DEGREES:input_rotation"
  input_side_packet: "FLIP_HORIZONTALLY:input_horizontally_flipped"
  input_side_packet: "FLIP_VERTICALLY:input_vertically_flipped"
  output_stream: "IMAGE_GPU:transformed_input_video"
}
node {
  calculator: "PoseLandmarkGpu"
  input_stream: "IMAGE:transformed_input_video"
  input_side_packet: "MODEL_COMPLEXITY:model_complexity"
  input_side_packet: "SMOOTH_LANDMARKS:smooth_landmarks"
  input_side_packet: "ENABLE_SEGMENTATION:enable_segmentation"
  input_side_packet: "SMOOTH_SEGMENTATION:smooth_segmentation"
  output_stream: "LANDMARKS:pose_landmarks"
  output_stream: "WORLD_LANDMARKS:pose_world_landmarks"
}

Working case

If I use TextureFromCamera.SetPixels32(WebCamTexture.GetPixels32()); and create new ImageFrame from it with:

var imageFrame = new ImageFrame(ImageFormat.Types.Format.Srgba, TextureFromCamera.width, TextureFromCamera.height, TextureFromCamera.width * 4, TextureFromCamera.GetRawTextureData<byte>());

then pass this ImageFrame to the graph it works (obviously I change the graph to expect ImageFrame instead of GpuBuffer).

Metal

For Metal support I meant, if it's possible to pass Metal's pointer to MediaPipe like we're doing with GLES. Reason for tihs is that we do not want to pass texture from GPU to CPU and back to GPU.

homuler commented 2 years ago
textureFrame.ReadTextureFromOnGPU(texture);

What is the return value? (if it returned false, it means it failed)

For Metal support I meant, if it's possible to pass Metal's pointer to MediaPipe like we're doing with GLES. Reason for tihs is that we do not want to pass texture from GPU to CPU and back to GPU.

I am aware that this problem exists, and I'd like to implement the feature if I had unlimited time, but it's not really a high priority because I'm not sure if it really improves the plugin's performance (if the sample app runs at 60fps and the inference step takes less than 1/60 sec, it may not make many differences if any). If you can demonstrate that there's really a performance hit in that area (e.g. it performs worse than the official iOS sample app), I think the priority will be high.

Paxios commented 2 years ago

textureFrame.ReadTextureFromOnGPU(texture); returns true.

I haven't yet tried the app on newer iOS devices, so it's possible that Metal support for this won't be possible as you said 😄.

Paxios commented 2 years ago

Do you maybe have/use some community channel like discord group?

homuler commented 2 years ago

textureFrame.ReadTextureFromOnGPU(texture); returns true.

Hmm, I don't know. On my Android device, it certainly works when I applied the below patch.

diff --git a/Assets/MediaPipeUnity/Samples/Common/Scripts/Solution.cs b/Assets/MediaPipeUnity/Samples/Common/Scripts/Solution.cs
index 813c66a..a6f4322 100644
--- a/Assets/MediaPipeUnity/Samples/Common/Scripts/Solution.cs
+++ b/Assets/MediaPipeUnity/Samples/Common/Scripts/Solution.cs
@@ -76,7 +76,7 @@ namespace Mediapipe.Unity

       if (textureType == typeof(WebCamTexture))
       {
-        textureFrame.ReadTextureFromOnCPU((WebCamTexture)sourceTexture);
+        textureFrame.ReadTextureFromOnGPU((WebCamTexture)sourceTexture);
       }
       else if (textureType == typeof(Texture2D))
       {

The possible reasons I can come up with are:

so it's possible that Metal support for this won't be possible as you said

To be precise, supporting Metal itself is possible, but it's not a high priority for now.

Do you maybe have/use some community channel like discord group?

No, I don't.

Paxios commented 2 years ago

Settings are shown below, should I maybe set any of the ES version to be required?

image

I will experiment with WebCamTexture's format and gles context.

Paxios commented 2 years ago

By the way, TextureFormat of the camera is "R8G8B8A8_UNorm". So I guess this could be the problem, since it's RGBA32 instead of ARGB32?

Also a note, that ReadTextureFromOnCPU doesn't work either, so I guess there's something wrong with my implementation 😄

homuler commented 2 years ago

Settings are shown below, should I maybe set any of the ES version to be required?

OpenGL ES 3.2 is required to share the context with MediaPipe (that's why even ReadTextureFromOnCPU doesn't work). I strongly recommend you first modify and test the sample app before writing your own code.

Paxios commented 2 years ago

Hey there once more, I looked into the code more deeply than before and I can't find the usage of ReadTextureFromOnGPU in the sample app. Is it because of the high latency, you mentioned in the comment? If I switch from ReadTextureFromOnCPU to ReadTextureFromOnGPU in Solution class, no estimations are returned from the graph (as it happens in my app).

I would very much appreciate it, if you could please verify that ReadTextureFromOnGPU works on your side.

Usage of ReadTextureFromOnCPU

https://github.com/homuler/MediaPipeUnityPlugin/blob/master/Assets/MediaPipeUnity/Samples/Common/Scripts/Solution.cs#L69-L89

I tested it on 2 GPUs ARM Mali-G71 MP20 and Xclipse 920, which both support GL ES 3.2

homuler commented 2 years ago

I would very much appreciate it, if you could please verify that ReadTextureFromOnGPU works on your side.

I confirmed (see https://github.com/homuler/MediaPipeUnityPlugin/issues/768#issuecomment-1279975261).

homuler commented 2 years ago

I think you should display the target texture (after calling ReadTextureFromOnGPU) on the screen first.

Paxios commented 2 years ago

I would very much appreciate it, if you could please verify that ReadTextureFromOnGPU works on your side.

I confirmed (see #768 (comment)).

Mind if I ask which device you tried it on or which GPU it uses ?

homuler commented 2 years ago

Pixel 6. At any rate, I think you should check if the pixel data is actually copied on GPU (cf. https://github.com/homuler/MediaPipeUnityPlugin/issues/768#issuecomment-1282360996).

Paxios commented 2 years ago

Sorry for a late response.

textureFrame.ReadTextureFromOnGPU(texture);
texture2DToDisplay.SetPixels32(textureFrame.GetPixels32());
texture2DToDisplay.Apply();
RawImage.texture = texture2DToDisplay;

texture is WebCamTexture. This does actually display the correct image in the RawImage on the screen.

The following is the process of applying TextureFrame to the graph.

var gpuBuffer = textureFrame.BuildGpuBuffer(GpuManager.GlCalculatorHelper.GetGlContext());
            _graph.AddPacketToInputStream("input_video", new GpuBufferPacket(gpuBuffer, new Timestamp(currentMicroSeconds))).AssertOk();

Output processing:

_outputLandmarksStream = new OutputStream<LandmarkListPacket, LandmarkList>(_graph, OutputNodeName);
_outputLandmarksStream.AddListener(OnPoseLandmarksOutput);

private void OnPoseWorldLandmarksOutput(object stream, OutputEventArgs<LandmarkList> eventArgs) {
            if (eventArgs.value != null) {
                ... 
                I use eventArgs.value in here
                ...
            }
        }

This code works if I change ReadTextureFromOnGPU to ReadTextureFromOnCPU.

homuler commented 2 years ago

Which log is output on your device? (maybe you need to build your apk with Development Build checked). https://github.com/homuler/MediaPipeUnityPlugin/blob/391d7d98b127ce41ceac85ec47f6126664f1bc4e/Packages/com.github.homuler.mediapipe/Runtime/Scripts/Unity/GpuManager.cs#L78-L87

This does actually display the correct image in the RawImage on the screen.

Hmm, interesting. I guess this code wouldn't work on my Pixel 6 rather (I need to do RawImage.texture = textureFrame._texture). In general, Graphics.CopyTexture only works on GPU. When I used Unity 2020.3.x, the data on CPU had been invalidated after calling Graphics.CopyTexture (cf. https://forum.unity.com/threads/graphics-copytexture-then-getpixels.482601/) (I've not tested it yet with Unity 2021.3.x),

Paxios commented 2 years ago

Hey, I just tried modifying your code as follows:

      if (textureType == typeof(WebCamTexture))
      {
        textureFrame.ReadTextureFromOnCPU((WebCamTexture)sourceTexture);
        if(textureFrame._texture != null)
          rawImage.texture = textureFrame._texture;
      }

This does show the camera preview on the screen. But if I use ReadTextureFromOnGPU it doesn't. So I guess there's some problem setting _texture in TextureFrame. ReadTextureFromOnGPU returns true, so I don't know what would cause this.

Which log is output on your device? (maybe you need to build your apk with Development Build checked).

Output is the following: Unity GpuManager: EGL context is found: 511835274624

Paxios commented 2 years ago

Some additional information: I'm using Unity 2022.2.0 (as per your advice in https://github.com/homuler/MediaPipeUnityPlugin/issues/760). ReadTextureFromOnGPU uses Graphics.CopyTexture.

I checked additional data in ReadTextureFromOnGPU srcFormat (WebCamTexure): R8G8B8A8_UNorm thisFormat (Texture2D): RGBA32 Width & Height match on both SystemInfo.copyTextureSupport returns: Basic, Copy3D, DifferentTypes, TextureToRT, RTToTexture

I have also tested your sample app on Pixel 6 with ReadTextureFromOnGPU set and it's the same outcome.

Paxios commented 2 years ago

I managed to find out the cause of this issue 😅 https://github.com/homuler/MediaPipeUnityPlugin/blob/6b8c6743f23539f7604e74dc260b01e0f58f1707/Assets/MediaPipeUnity/Samples/Common/Scripts/ImageSourceSolution.cs#L71-L73 I had to change the format of the pool from TextureFormat.RGBA32 to TextureFormat.ARGB32. I think there's a typo in the comment :)

Also this works on GLES 3.1, so 3.2 is not mandatory

homuler commented 2 years ago

I had to change the format of the pool from TextureFormat.RGBA32 to TextureFormat.ARGB32. I think there's a typo in the comment :)

So it seems that the cause was:

Your device's WebCamTexture format is not ARGB32 and the channels of the converted image are not aligned as MediaPipe expects.

The following comment is not a typo (cf. https://github.com/google/mediapipe/blob/7a6ae97a0ef298014aaf5e1370cb6f8237f2ac21/mediapipe/gpu/gpu_buffer_format.cc#L64-L78).

// When using GpuBuffer, MediaPipe assumes that the input format is BGRA, so the following code must be fixed.

However, at least in Unity, this assumption does not always hold (the input format can be RGBA or ARGB, etc...). Currently, this issue can be avoided by changing the texture format as you did (but it's not intuitive which format should be used).

Also this works on GLES 3.1, so 3.2 is not mandatory

Indeed, I was wrong about this, and it seems that OpenGL ES 3.2 is not required to create a context. https://github.com/google/mediapipe/blob/7a6ae97a0ef298014aaf5e1370cb6f8237f2ac21/mediapipe/gpu/gl_context_egl.cc#L110-L171

Paxios commented 2 years ago

So it seems that the cause was:

Yes, that was the cause :)

The following comment is not a typo

Ah okay 👍🏼, wasn't sure.

Is there any way to not block the CPU while TextureFramePool executes TextureFrame#WaitUntilReleased? https://github.com/homuler/MediaPipeUnityPlugin/blob/6b8c6743f23539f7604e74dc260b01e0f58f1707/Assets/MediaPipeUnity/Samples/Common/Scripts/ImageSource/TextureFramePool.cs#L121 https://github.com/homuler/MediaPipeUnityPlugin/blob/6b8c6743f23539f7604e74dc260b01e0f58f1707/Assets/MediaPipeUnity/Samples/Common/Scripts/ImageSource/TextureFrame.cs#L268-L281 Or will this unsync the GPU & CPU and result in uncontrollable crashes? I'd like to achieve relatively smooth performance on old devices (e.g. Samsung Galaxy J7). Currently this on average causes 90-100 ms of lag.

homuler commented 2 years ago

Currently this on average causes 90-100 ms of lag.

Do you mean _glSyncToken.Wait() takes 90-100ms? If so, how did you measure it?

Paxios commented 2 years ago

Currently this on average causes 90-100 ms of lag.

Do you mean _glSyncToken.Wait() takes 90-100ms? If so, how did you measure it?

Yes, that's correct. With Unity deep profiling.

image

It's far less on modern devices (10-25ms)

homuler commented 2 years ago

Does changing TextureFramePool._poolSize (e.g. 20) make any differences? https://github.com/homuler/MediaPipeUnityPlugin/blob/391d7d98b127ce41ceac85ec47f6126664f1bc4e/Assets/MediaPipeUnity/Samples/Common/Scripts/ImageSource/TextureFramePool.cs#L18

Paxios commented 2 years ago

No, not at all. It just delays it a bit. I once set it to 10000 (changed the GlobalInstanceTable size) and that delay, didn't happen. But the game crashed. I guess because not enough resources were available for 10k textures.

tealm commented 2 years ago

I am also seeing very slow results on my device (Android Galaxy 04, built with SDK28 and minVersion set to OpenGL ES 3.1). Do you have any ideas why the latency is high when copied to GPU as stated in your comment? // For some reason, when the image is coiped on GPU, latency tends to be high.

Profiler screenshot from running the Hands tracking sample on Android shows that it does take time to read image from GPU. image image

dayowoo commented 1 year ago

@tealm I have a same question. Do you get the answer?... Thank you for checking my message.