Documentation for Holistic Scene

aadityasp commented 2 years ago

Plugin Version or Commit ID

v0.10.0

Unity Version

2021.3.4f1

Your Host OS

macOS Monterey 12.4

Target Platform

Mac Standalone

Description

Hi, I'm trying to build multi-person tracking using this plugin. Similar to the multiple hand tracking scene, I would like to create a multi-person tracking scene with holistic landmarks overlayed on every tracked individual.

I understand mediapipe supports only one person tracking in their holistic scene. So I'm trying to run a YOLO/SSD model to get two best detections, and I would like to run holistic graph on top of these two bounding boxes. But to do that, I would need to pass in a small region of the image to the holistic graph. Is it possible to do that with the existing implementation of holistic scene? If yes, Could you please guide me on how to pass in an image to the holistic graph to get an output image with holistic landmarks?

So far, I have an SSD model running inside unity, and it takes in the camera feed and returns me the top 2 detection's bounding boxes. I'm stuck on how to use this plugin to pass these SSD detected regions to holistic graph to extract holistic landmarks for every bounding box input.

I really appreciate you for this wonderful plugin! But I wasn't able to find any documentation on how to use the scenes and what every function in the solution scripts are doing. I have been stuck on this issue for over 2 weeks and would really appreciate any support! Thank you!

Code to Reproduce the issue

No response

Additional Context

I was able to run the multi person tracking following a similar approach using mediapipe Python API. I ran an SSD model, got the top detections as bounding boxes, and ran mediapipe pose detection on every bounding box to overlay holistic landmarks on all bounding boxes. It worked fine but had low fps.

But I would like to try out the same thing inside Unity, and thankfully I came across your plugin. But I'm having trouble figuring out how to send the bounding box regions of the image as input to the holistic graph to get the holistic landmarks. I'd really appreciate any help! Thank you!

homuler commented 2 years ago

But I wasn't able to find any documentation on how to use the scenes and what every function in the solution scripts are doing.

Have you tried the tutorial?

aadityasp commented 2 years ago

Yes I did try the tutorial, but I get a blank screen output on running the following code from tutorial,

using System.Collections;
using System.Collections.Generic;
using UnityEngine;
using UnityEngine.UI;
using Mediapipe.Unity.CoordinateSystem;

using Stopwatch = System.Diagnostics.Stopwatch;

namespace Mediapipe.Unity.Tutorial
{
  public class FaceMesh : MonoBehaviour
  {
    [SerializeField] private TextAsset _configAsset;
    [SerializeField] private RawImage _screen;
    [SerializeField] private int _width;
    [SerializeField] private int _height;
    [SerializeField] private int _fps;

    private CalculatorGraph _graph;
    private ResourceManager _resourceManager;

    private WebCamTexture _webCamTexture;
    private Texture2D _inputTexture;
    private Color32[] _inputPixelData;
    private Texture2D _outputTexture;
    private Color32[] _outputPixelData;

    private IEnumerator Start()
    {
      if (WebCamTexture.devices.Length == 0)
      {
        throw new System.Exception("Web Camera devices are not found");
      }
      var webCamDevice = WebCamTexture.devices[0];
      _webCamTexture = new WebCamTexture(webCamDevice.name, _width, _height, _fps);
      _webCamTexture.Play();

      yield return new WaitUntil(() => _webCamTexture.width > 16);

      _screen.rectTransform.sizeDelta = new Vector2(_width, _height);

      _inputTexture = new Texture2D(_width, _height, TextureFormat.RGBA32, false);
      _inputPixelData = new Color32[_width * _height];
      _outputTexture = new Texture2D(_width, _height, TextureFormat.RGBA32, false);
      _outputPixelData = new Color32[_width * _height];

      _screen.texture = _outputTexture;

      _resourceManager = new StreamingAssetsResourceManager();
      yield return _resourceManager.PrepareAssetAsync("face_detection_short_range.bytes");
      yield return _resourceManager.PrepareAssetAsync("face_landmark_with_attention.bytes");

      var stopwatch = new Stopwatch();

      _graph = new CalculatorGraph(_configAsset.text);
      var outputVideoStream = new OutputStream<ImageFramePacket, ImageFrame>(_graph, "output_video");
      var multiFaceLandmarksStream = new OutputStream<NormalizedLandmarkListVectorPacket, List<NormalizedLandmarkList>>(_graph, "multi_face_landmarks");
      outputVideoStream.StartPolling().AssertOk();
      multiFaceLandmarksStream.StartPolling().AssertOk();
      _graph.StartRun().AssertOk();
      stopwatch.Start();

      var screenRect = _screen.GetComponent<RectTransform>().rect;

      while (true)
      {
        _inputTexture.SetPixels32(_webCamTexture.GetPixels32(_inputPixelData));
        var imageFrame = new ImageFrame(ImageFormat.Types.Format.Srgba, _width, _height, _width * 4, _inputTexture.GetRawTextureData<byte>());
        var currentTimestamp = stopwatch.ElapsedTicks / (System.TimeSpan.TicksPerMillisecond / 1000);
        _graph.AddPacketToInputStream("input_video", new ImageFramePacket(imageFrame, new Timestamp(currentTimestamp))).AssertOk();

        yield return new WaitForEndOfFrame();

        if (outputVideoStream.TryGetNext(out var outputVideo))
        {
          if (outputVideo.TryReadPixelData(_outputPixelData))
          {
            _outputTexture.SetPixels32(_outputPixelData);
            _outputTexture.Apply();
          }
        }

        if (multiFaceLandmarksStream.TryGetNext(out var multiFaceLandmarks))
        {
          if (multiFaceLandmarks != null && multiFaceLandmarks.Count > 0)
          {
            foreach (var landmarks in multiFaceLandmarks)
            {
              // top of the head
              var topOfHead = landmarks.Landmark[10];
              Debug.Log($"Unity Local Coordinates: {screenRect.GetPoint(topOfHead)}, Image Coordinates: {topOfHead}");
            }
          }
        }
      }
    }

    private void OnDestroy()
    {
      if (_webCamTexture != null)
      {
        _webCamTexture.Stop();
      }

      if (_graph != null)
      {
        try
        {
          _graph.CloseInputStream("input_video").AssertOk();
          _graph.WaitUntilDone().AssertOk();
        }
        finally
        {

          _graph.Dispose();
        }
      }
    }
  }
}

I did not get any errors while building or running, but output is just a blank screen. So I couldn't follow through the tutorial.

homuler commented 2 years ago

I think you should try to run it on UnityEditor first. In the meantime, have you checked the Player.log?

homuler commented 2 years ago

By the way, if the cause is that you forgot to set _configAsset, I guess you're just trying the final step. If so, I strongly recommend you follow instructions from the first step (Hello World).

If you have any questions or difficulties in understanding anything in the tutorial, please do not hesitate to ask! However, please don't copy and paste code without reading what is written and then ask questions just because it doesn't work.

aadityasp commented 2 years ago

Thanks for the response, I did follow every step of the official solution under tutorial. I set the _configAsset to face_mesh_desktop_live, after which I got this output. The reason I'm not able to run in UnityEditor is I get the following error. "DllNotFoundException: mediapipe_c assembly: type: member:(null) Mediapipe.UnsafeNativeMethods..cctor () (at Packages/com.github.homuler.mediapipe/Runtime/Scripts/PInvoke/UnsafeNativeMethods.cs:29)"

I'm using an M1 Mac with Monterey v12.4.

Just to be sure, I pulled the repo fresh and rebuilt it. The build command ran without any errors. And I used UnityEditor (Apple silicon) (2022.1.5f1) to open the project. I'm able to build to IOS, android and Mac, but unable to run in editor due to the above error.

homuler commented 2 years ago

See https://github.com/homuler/MediaPipeUnityPlugin/issues/640#issuecomment-1174868942 and https://github.com/homuler/MediaPipeUnityPlugin/issues/640#issuecomment-1175718450 or download the built package from the release page.

aadityasp commented 2 years ago

Thanks alot for the built package! I'm able to run the scenes in my editor now.

Quick question about Annotations tutorial: Is there a way to annotate the landmarks from multiple graphs onto a single raw image? If yes how to align these annotations?

Currently I understand from the tutorial that you get the Webcam Texture, convert it to Texture2D from which you get the rawtexture data and pass it to mediapipe graph. You get the multiFaceLandmarks from the output stream of that graph, And when you call the DrawNow function it is drawing the landmarks on top of that screen.

But I'm trying to take 2 regions from the input Texture2D (say left half and right half), and I would like to run 2 different mediapipe graphs on top of these regions. I would also have 2 MultiFaceLandmarkListAnnotationControllers, but I want these 2 annotations to be translated onto a single screen.

I guess where I'm confused is, if there is a way to pass in rawtexture data of a Rect from a texture as input to the graph, and if there is a way to specify the rect transform where the output annotations should appear in that screen..

homuler commented 2 years ago

Sorry, I don't have time right now, so I will answer briefly.

Is there a way to annotate the landmarks from multiple graphs onto a single raw image? If yes how to align these annotations?

You can attach multiple AnnotationController to the Annotation Layer and they don't mind from which CalculatorGraph the input comes, so the answer is yes. Please check each scene to see how they work actually.

I would also have 2 MultiFaceLandmarkListAnnotationControllers, but I want these 2 annotations to be translated onto a single screen.

I think you can use LandmarkProjectionCalculator to project output landmarks to the original coordinates.

I suppose it is possible to detect multiple people using a single CalculatorGraph, but I don't have time to verify this and describe the results (See face_landmark_front_gpu.pbtxt and pose_landmark_gpu.pbtxt).

aadityasp commented 2 years ago

Thanks for these resources homuler, I'm very new to mediapipe and I really appreciate your patience. I see that the LandmarkProjectionCalculator needs a normalized rect on the input stream. Can you point me to the resources on how to pass in a recttransform of raw image to this input stream?

I suppose it is possible to detect multiple people using a single CalculatorGraph, but I don't have time to verify this and describe the results (See face_landmark_front_gpu.pbtxt and pose_landmark_gpu.pbtxt).

Interesting! I understand that the mediapipe detector outputs multiple detections, but only the best one is selected. But How do I change these ranges and test it via this plugin? I could not find the pose_landmark_gpu.pbtxt in this package.

aadityasp commented 2 years ago

Screen Shot 2022-07-18 at 2 41 48 PM Is this the correct way to use the LandmarkProjectionCalculator? If yes, how can I pass in a rect transform of a raw image to the "rect" input stream using your API?

homuler commented 2 years ago

But How do I change these ranges and test it via this plugin? I could not find the pose_landmark_gpu.pbtxt in this package.

You can copy the contents of pose_landmark_gpu.pbtxt and replace PoseLandmarkGpu with it (note that you may need to change some of the stream names).

Is this the correct way to use the LandmarkProjectionCalculator? If yes, how can I pass in a rect transform of a raw image to the "rect" input stream using your API?

Ah...maybe it's correct, but RectPacket cannot be used for input packets currently, so you need to implement some APIs in C++ (cf. https://github.com/homuler/MediaPipeUnityPlugin/issues/632#issuecomment-1171816808).

homuler / MediaPipeUnityPlugin