Use ARCamera as input for MediaPipe?

patrick508 commented 3 years ago

Hi,

So for a project I'm working on I got the mediapipe sample to work with the webcam as done in the Sample project as well. However in my project I use the ARCameraManager from Unity to render the camera image to the screen. I need this camera because I am also trying to get the depth from this camera.

This is currently giving me issues as MediaPipe tries to start and access the webcam, while ARCamera is already using the camera. I tried to make the MediaPipe sample to work with the ARCamera but failed. It's tightly coupled to the webcam as for now. Is there any input or help I could get regarding this issue? Perhaps someone has already managed to get it to work with the ARFoundation ARCameraManager?

In short what I'm trying to achieve: Give MediaPipe the texture2D I get from ARCameraManager(I managed to get this texture already) and get the pose from that source.

homuler commented 3 years ago

In the following, I assume that you are using PoseTrackingSolution.

It's tightly coupled to the webcam as for now.

Not really. Currently, texture data is read as follows. https://github.com/homuler/MediaPipeUnityPlugin/blob/2994e4425d347260006e15edc8d375e0d35a3b0f/Assets/Mediapipe/Samples/Scenes/Pose%20Tracking/PoseTrackingSolution.cs#L134-L135 https://github.com/homuler/MediaPipeUnityPlugin/blob/2994e4425d347260006e15edc8d375e0d35a3b0f/Assets/Mediapipe/Samples/Common/Scripts/Solution.cs#L88-L108

So if you have Texture2D at hand, you can read it like this.

// ReadFromImageSource(imageSource, textureFrame); 

// Texture2D texture2d;
textureFrame.ReadTextureFromOnCPU(texture2d);

If you want to do it the right way, you will have to implement an ImageSource class that supports ARCamera. See ImageSource for more details.

ROBYER1 commented 3 years ago

I would also suggest trying out the XRCPUImage with AR Foundation, let me know if you get any further with this https://docs.unity3d.com/Packages/com.unity.xr.arsubsystems@4.1/api/UnityEngine.XR.ARSubsystems.XRCpuImage.html

patrick508 commented 2 years ago

@homuler Thanks for the help! Sorry for the late reaction. I managed to find a solution that is kind of hacky but it works for now. Planning on cleaning it up soon and trying to find a more low level fix.

For those interested: ` public class ARPoseProvider: BasePoseProvider, IPoseProvider { public IEnumerator Start() { AssetLoader.Provide(new StreamingAssetsResourceManager());

        StartCoroutine(StartARCameraImagePoseTracking());

        yield return StartCoroutine(InitializeInferenceMode());

        initialized = true;
    }

    private IEnumerator StartARCameraImagePoseTracking()
    {
        yield return StartCoroutine(InitializeARCameraSources());

        yield return StartCoroutine(InitializeARCameraPoseGraphRoutine());

        arCameraManager.frameReceived += OnARCameraFrameReceived;
    }

    private void OnARCameraFrameReceived(ARCameraFrameEventArgs eventArgs)
    {
        arTextureToUpdate = GetCurrentColorTexture();

        UpdateARPose(arTextureToUpdate);
    }

    private void UpdateARPose(Texture2D texture)
    {
        arTextureFrame ??= new TextureFrame(texture.width, texture.height);

        arTextureFrame.SetPixels32(texture.GetPixels32());

        arCameraPoseTrackingGraph.AddTextureFrameToInputStream(arTextureFrame).AssertOk();
        currentARCameraPose = arCameraPoseTrackingGraph.FetchNextValue();
    }

    unsafe Texture2D GetCurrentColorTexture()
    {
        if (!arCameraManager.TryAcquireLatestCpuImage(out XRCpuImage image))
            return null;

        XRCpuImage.ConversionParams conversionParams = new XRCpuImage.ConversionParams
        {
            inputRect = new RectInt(0, 0, image.width, image.height),
            outputDimensions = new Vector2Int(image.width, image.height),
            outputFormat = TextureFormat.RGBA32,
            transformation = XRCpuImage.Transformation.MirrorX
        };

        int size = image.GetConvertedDataSize(conversionParams);
        NativeArray<byte> buffer = new NativeArray<byte>(size, Allocator.Temp);

        image.Convert(conversionParams, new IntPtr(buffer.GetUnsafePtr()), buffer.Length);
        image.Dispose();

        if (arTexture == null)
        {
            arTexture = new Texture2D(conversionParams.outputDimensions.x, conversionParams.outputDimensions.y, conversionParams.outputFormat, false);
        }

        arTexture.LoadRawTextureData(buffer);
        arTexture.Apply();
        buffer.Dispose();

        return arTexture;
    }
}`

homuler commented 2 years ago

I also have written a minimal code to run the Face Detection solution for those who've gotten to this issue. Please also refer to it.

// Copyright (c) 2021 homuler
//
// Use of this source code is governed by an MIT-style
// license that can be found in the LICENSE file or at
// https://opensource.org/licenses/MIT.

using Mediapipe;
using Mediapipe.Unity;

using System;
using System.Collections;

using Unity.Collections;
using Unity.Collections.LowLevel.Unsafe;
using UnityEngine;
using UnityEngine.XR.ARFoundation;
using UnityEngine.XR.ARSubsystems;

using Stopwatch = System.Diagnostics.Stopwatch;

public class ARCameraManagerTest : MonoBehaviour
{
  [SerializeField] private ARCameraManager _cameraManager;
  [SerializeField] private TextAsset _configText; // attach `face_detection_gpu.txt`

  private CalculatorGraph _calculatorGraph;
  private NativeArray<byte> _buffer;
  private Stopwatch _stopwatch;
  private ResourceManager _resourceManager;
  private GpuResources _gpuResources;

  private IEnumerator Start()
  {
    _cameraManager.frameReceived += OnCameraFrameReceived;
    _stopwatch = new Stopwatch();

    _resourceManager = new StreamingAssetsResourceManager();
    yield return _resourceManager.PrepareAssetAsync("face_detection_short_range.bytes");
    yield return _resourceManager.PrepareAssetAsync("face_detection_full_range_sparse.bytes");

    _gpuResources = GpuResources.Create().Value();
    _calculatorGraph = new CalculatorGraph(_configText.text);
    _calculatorGraph.SetGpuResources(_gpuResources).AssertOk();

    _calculatorGraph.ObserveOutputStream("face_detections", 0, OutputCallback, true).AssertOk();

    var sidePacket = new SidePacket();
    sidePacket.Emplace("input_rotation", new IntPacket(0));
    sidePacket.Emplace("input_horizontally_flipped", new BoolPacket(false));
    sidePacket.Emplace("input_vertically_flipped", new BoolPacket(true));
    sidePacket.Emplace("model_type", new IntPacket(0));

    _calculatorGraph.StartRun(sidePacket).AssertOk();
    _stopwatch.Start();
  }

  private void OnDestroy()
  {
    _cameraManager.frameReceived -= OnCameraFrameReceived;

    var status = _calculatorGraph.CloseAllPacketSources();
    if (!status.Ok())
    {
      Debug.Log($"Failed to close packet sources: {status}");
    }

    status = _calculatorGraph.WaitUntilDone();
    if (!status.Ok())
    {
      Debug.Log(status);
    }

    _calculatorGraph.Dispose();
    _gpuResources.Dispose();
    _buffer.Dispose();
  }

  private unsafe void OnCameraFrameReceived(ARCameraFrameEventArgs eventArgs)
  {
    if (_cameraManager.TryAcquireLatestCpuImage(out var image))
    {
      InitBuffer(image);

      var conversionParams = new XRCpuImage.ConversionParams(image, TextureFormat.RGBA32);
      var ptr = (IntPtr)NativeArrayUnsafeUtility.GetUnsafePtr(_buffer);
      image.Convert(conversionParams, ptr, _buffer.Length);
      image.Dispose();

      var imageFrame = new ImageFrame(ImageFormat.Types.Format.Srgba, image.width, image.height, 4 * image.width, _buffer);
      var currentTimestamp = _stopwatch.ElapsedTicks / (TimeSpan.TicksPerMillisecond / 1000);
      var imageFramePacket = new ImageFramePacket(imageFrame, new Timestamp(currentTimestamp));

      _calculatorGraph.AddPacketToInputStream("input_video", imageFramePacket).AssertOk();
    }
  }

  private void InitBuffer(XRCpuImage image)
  {
    var length = image.width * image.height * 4;
    if (_buffer == null || _buffer.Length != length)
    {
      _buffer = new NativeArray<byte>(length, Allocator.Persistent, NativeArrayOptions.UninitializedMemory);
    }
  }

  [AOT.MonoPInvokeCallback(typeof(CalculatorGraph.NativePacketCallback))]
  private static IntPtr OutputCallback(IntPtr graphPtr, int steramId, IntPtr packetPtr)
  {
    try
    {
      using (var packet = new DetectionVectorPacket(packetPtr, false))
      {
        var value = packet.IsEmpty() ? null : packet.Get();

        if (value != null && value.Count > 0)
        {
          foreach (var detection in value)
          {
            Debug.Log(detection);
          }
        }
      }
      return Status.Ok().mpPtr;
    }
    catch (Exception e)
    {
      return Status.FailedPrecondition(e.ToString()).mpPtr;
    }
  }
}

pinak1999 commented 1 year ago

I also have written a minimal code to run the Face Detection solution for those who've gotten to this issue. Please also refer to it.

// Copyright (c) 2021 homuler
//
// Use of this source code is governed by an MIT-style
// license that can be found in the LICENSE file or at
// https://opensource.org/licenses/MIT.

using Mediapipe;
using Mediapipe.Unity;

using System;
using System.Collections;

using Unity.Collections;
using Unity.Collections.LowLevel.Unsafe;
using UnityEngine;
using UnityEngine.XR.ARFoundation;
using UnityEngine.XR.ARSubsystems;

using Stopwatch = System.Diagnostics.Stopwatch;

public class ARCameraManagerTest : MonoBehaviour
{
  [SerializeField] private ARCameraManager _cameraManager;
  [SerializeField] private TextAsset _configText; // attach `face_detection_gpu.txt`

  private CalculatorGraph _calculatorGraph;
  private NativeArray<byte> _buffer;
  private Stopwatch _stopwatch;
  private ResourceManager _resourceManager;
  private GpuResources _gpuResources;

  private IEnumerator Start()
  {
    _cameraManager.frameReceived += OnCameraFrameReceived;
    _stopwatch = new Stopwatch();

    _resourceManager = new StreamingAssetsResourceManager();
    yield return _resourceManager.PrepareAssetAsync("face_detection_short_range.bytes");
    yield return _resourceManager.PrepareAssetAsync("face_detection_full_range_sparse.bytes");

    _gpuResources = GpuResources.Create().Value();
    _calculatorGraph = new CalculatorGraph(_configText.text);
    _calculatorGraph.SetGpuResources(_gpuResources).AssertOk();

    _calculatorGraph.ObserveOutputStream("face_detections", 0, OutputCallback, true).AssertOk();

    var sidePacket = new SidePacket();
    sidePacket.Emplace("input_rotation", new IntPacket(0));
    sidePacket.Emplace("input_horizontally_flipped", new BoolPacket(false));
    sidePacket.Emplace("input_vertically_flipped", new BoolPacket(true));
    sidePacket.Emplace("model_type", new IntPacket(0));

    _calculatorGraph.StartRun(sidePacket).AssertOk();
    _stopwatch.Start();
  }

  private void OnDestroy()
  {
    _cameraManager.frameReceived -= OnCameraFrameReceived;

    var status = _calculatorGraph.CloseAllPacketSources();
    if (!status.Ok())
    {
      Debug.Log($"Failed to close packet sources: {status}");
    }

    status = _calculatorGraph.WaitUntilDone();
    if (!status.Ok())
    {
      Debug.Log(status);
    }

    _calculatorGraph.Dispose();
    _gpuResources.Dispose();
    _buffer.Dispose();
  }

  private unsafe void OnCameraFrameReceived(ARCameraFrameEventArgs eventArgs)
  {
    if (_cameraManager.TryAcquireLatestCpuImage(out var image))
    {
      InitBuffer(image);

      var conversionParams = new XRCpuImage.ConversionParams(image, TextureFormat.RGBA32);
      var ptr = (IntPtr)NativeArrayUnsafeUtility.GetUnsafePtr(_buffer);
      image.Convert(conversionParams, ptr, _buffer.Length);
      image.Dispose();

      var imageFrame = new ImageFrame(ImageFormat.Types.Format.Srgba, image.width, image.height, 4 * image.width, _buffer);
      var currentTimestamp = _stopwatch.ElapsedTicks / (TimeSpan.TicksPerMillisecond / 1000);
      var imageFramePacket = new ImageFramePacket(imageFrame, new Timestamp(currentTimestamp));

      _calculatorGraph.AddPacketToInputStream("input_video", imageFramePacket).AssertOk();
    }
  }

  private void InitBuffer(XRCpuImage image)
  {
    var length = image.width * image.height * 4;
    if (_buffer == null || _buffer.Length != length)
    {
      _buffer = new NativeArray<byte>(length, Allocator.Persistent, NativeArrayOptions.UninitializedMemory);
    }
  }

  [AOT.MonoPInvokeCallback(typeof(CalculatorGraph.NativePacketCallback))]
  private static IntPtr OutputCallback(IntPtr graphPtr, int steramId, IntPtr packetPtr)
  {
    try
    {
      using (var packet = new DetectionVectorPacket(packetPtr, false))
      {
        var value = packet.IsEmpty() ? null : packet.Get();

        if (value != null && value.Count > 0)
        {
          foreach (var detection in value)
          {
            Debug.Log(detection);
          }
        }
      }
      return Status.Ok().mpPtr;
    }
    catch (Exception e)
    {
      return Status.FailedPrecondition(e.ToString()).mpPtr;
    }
  }
}

@homuler Error at line 39. I'm using the latest MediaPipeUnityPlugin-all.zip Assets\MediaPipeUnity\Samples\Scenes\Face Detection\AR-MP\ARCameraManagerTest.cs(39,64): error CS0407: 'IntPtr ARCameraManagerTest.OutputCallback(IntPtr, int, IntPtr)' has the wrong return type

homuler commented 1 year ago

@pinak1999 Please see https://github.com/homuler/MediaPipeUnityPlugin/issues/803#issuecomment-1351141965

dogadogan commented 1 year ago

I'm also getting the same error CS0407: 'IntPtr ARCameraManagerTest.OutputCallback(IntPtr, int, IntPtr)' has the wrong return type.

@homuler - How is https://github.com/homuler/MediaPipeUnityPlugin/issues/803#issuecomment-1351141965 related to this issue? Because it seems like that comment is using lifted_objects, whereas the code you have on this thread is for face_detections, right?

Doesn't face_detections use a vector the way you already have it?

homuler commented 1 year ago

NativePacketCallback, the 3rd argument of CalculatorGraph.ObserveOutputStream, should return StatusArgs now and that's why the above code has compile errors. https://github.com/homuler/MediaPipeUnityPlugin/blob/01cdd567ac5939b81114e27e8c13a92bde8275f4/Packages/com.github.homuler.mediapipe/Runtime/Scripts/Framework/CalculatorGraph.cs#L16

You may rewrite the OutputCallback, but I recommend you use OutputStream instead of calling CalculatorGraph#ObserveOutputStream directly.

dogadogan commented 1 year ago

I see, thanks! I tried the suggested approach but I'm getting the following error related to the OutputCallback: error CS0246: The type or namespace name 'FrameAnnotation' could not be found (are you missing a using directive or an assembly reference?)

Any thoughts? Here's the code I added:

 private IEnumerator Start()
  {
    // ...
    _faceDetectionsStream = new OutputStream<FrameAnnotationPacket, FrameAnnotation>(_calculatorGraph, "face_detections");
    _faceDetectionsStream.AddListener(OutputCallback);
    _calculatorGraph.StartRun(sidePacket).AssertOk();
    // ...
  }
private void OutputCallback(object stream, OutputEventArgs<FrameAnnotation> eventArgs)
  {
    Debug.Log(eventArgs.value);
  }

homuler commented 1 year ago

If you want to use FaceDetection, see the FaceDetectionGraph example. https://github.com/homuler/MediaPipeUnityPlugin/blob/01cdd567ac5939b81114e27e8c13a92bde8275f4/Assets/MediaPipeUnity/Samples/Scenes/Face%20Detection/FaceDetectionGraph.cs#L78-L79

Note that the output type of face_detections is not FrameAnnotation.

# Detected faces. (std::vector<Detection>)
output_stream: "face_detections"
https://github.com/google/mediapipe/blob/0dee33ccba37fcb9362a90b0042cd46730a7f9b5/mediapipe/graphs/face_detection/face_detection_desktop_live.pbtxt#L8-L9

Incidentally, the FrameAnnotation class is used in the Objectron sample, which is deprecated now.

dogadogan commented 1 year ago

Thank you! I fixed the output type and now it works well for face detection :)

I actually wanted to do this ARCore integration for object detection, but gave face detection a try as the first step. So now I adjusted the code for object detection (i.e., by using "output_detections" etc.), and the connection seems to be working!

Now I want to make use of the object detector output by adding more lines into OutputCallback, however, even if I add something small, it gives me this error:

Error Unity MediaPipeException: INTERNAL: Graph has errors: 
Error Unity System.NullReferenceException: Object reference not set to an instance of an object.
 Error Unity   at ARMPObjectDetection.OutputCallback (System.Object stream, Mediapipe.Unity.OutputEventArgs`1[TValue] eventArgs) 
 Unity   at Mediapipe.Unity.OutputStream`2[TPacket,TValue].InvokeIfOutputStreamFound (System.IntPtr graphPtr, System.Int32 streamId, System.IntPtr packetPtr) [0x00000] in 
Error Unity   at Mediapipe.Status.AssertOk ()
Error Unity   at ARMPObjectDetection.OnCameraFrameReceived (UnityEngine.XR.ARFoundation.ARCameraFrameEventArgs eventArgs) [
Error Unity   at UnityEngine.XR.ARFoundation.ARCameraManager.InvokeFrameReceivedEvent (UnityEngine.XR.ARSubsystems.XRCameraFrame frame) [0x00000] in
Error Unity   at UnityEngine.XR.ARFoundation.ARCameraManager.Update ()

What I wanted to do is similar to https://github.com/homuler/MediaPipeUnityPlugin/issues/1037#issuecomment-1764457384. I understand that eventArgs.value is a System.Collections.Generic.List. But even printing the number of detected objects (Debug.Log("Object count is: " + eventArgs.value.Count); in OutputCallback) gives the above error.

Do you know how I could address this issue? :)

dogadogan commented 1 year ago

Ah, alright, based on https://github.com/homuler/MediaPipeUnityPlugin/pull/935#issuecomment-1642240120, I realized I should use a question mark, i.e., eventArgs.value?.Count, to ensure the value is not null.

dogadogan commented 1 year ago

Hi! I just wanted to follow up on something related to this ARcore integration :)

I realized that the object detector/classifier works much better if the phone is held at a certain orientation. For example, it works much better if I hold the phone in landscape mode, compared to portrait mode.

I was wondering if there is a way to set the orientation of MediaPipe detection somehow in the code?

KiranJodhani commented 1 year ago

@homuler I am trying to implement this but I am not getting it working. Below is the steps i followed

I created sample script which has a listener from arCameraManager. I can see it's working

Now I have Texture2D from OnARCameraFrameReceived and applying this to textureFrame.ReadTextureFromOnCPU(OutputTextureFromARCamera); (This is not in ImageSourceSolution and I am using holistic)

From here I am not sure I am doing it correctly or not In Bootstrape

I changed from webcam to image but when i run showing texture in canvas. so for testing I added sample texure manually in static image source and it showed correctly. My question here is can we keep imagesource as image and set texture generated from OnARCameraFrameReceived? I tried this but when i change texture it's not updating somehow

Please share your thoughts

homuler commented 1 year ago

@dogadogan

I was wondering if there is a way to set the orientation of MediaPipe detection somehow in the code?

At least, you can rotate the input image on the Unity side https://github.com/homuler/MediaPipeUnityPlugin/blob/2d2863ea740a6a5ad01854ea88ab5f48be2a36b6/Assets/MediaPipeUnity/Samples/Scenes/Tasks/Face%20Detection/FaceDetectorRunner.cs#L80 or using ImageTransformationCalculator. https://github.com/homuler/MediaPipeUnityPlugin/blob/2d2863ea740a6a5ad01854ea88ab5f48be2a36b6/Assets/MediaPipeUnity/Samples/Scenes/Object%20Detection/object_detection_gpu.txt#L65-L78

homuler commented 1 year ago

@KiranJodhani Will you create a new issue? I'm sorry but I'm not sure what is the problem.

KiranJodhani commented 1 year ago

@homuler Thanks for the swift reply. I have created new issue here. Please feel free to ask any question you have about the issue. It's feature request though

https://github.com/homuler/MediaPipeUnityPlugin/issues/1045

dogadogan commented 12 months ago

Super helpful reply (https://github.com/homuler/MediaPipeUnityPlugin/issues/343#issuecomment-1791851150), thank you @homuler! :)

As a follow-up, I'm wondering if there is a way to access the originating input frame in an executed OutputCallback. For instance, when running Object Detection, only certain frames result in successful classification (whereas some frames are no good due to motion blur, etc., so then the returned value eventArgs.value in OutputCallback is null).

Do you know if there is a way to get the successful camera frame when OutputCallback returns non-empty eventArgs.value? Because I would like to do further processing on the original image then, based on the results in eventArgs.value. Or perhaps there's a smarter way to do this?

homuler commented 12 months ago

The Task API is designed to receive input images through a callback, https://github.com/homuler/MediaPipeUnityPlugin/blob/2d2863ea740a6a5ad01854ea88ab5f48be2a36b6/Packages/com.github.homuler.mediapipe/Runtime/Scripts/Tasks/Vision/FaceDetector/FaceDetectorOptions.cs#L27 but OutputStream is not, so it may be difficult if not impossible.

If it's acceptable, you can get the same result by executing the graph synchronously. In this case, probably you have the reference to the input image after getting the result.

dogadogan commented 12 months ago

Oh great, good to know, thanks @homuler!

Do you have any example code on how to use Task API instead of OutputStream/OutputCallback in this repo or elsewhere? My code is working well with the previous setup, so it's a bit hard for me to know where to start readjusting it for this.

Also, any pointers on how to execute the graph synchronously would be helpful too! In my use case, ARCore would run continuously in the background for the main AR tasks, and as long as it's relatively fast, it should be ok I believe. But at the same time, I wonder if this would interfere with ARCore/Foundation's OnCameraFrameReceived...

homuler commented 12 months ago

Do you have any example code on how to use Task API instead of OutputStream/OutputCallback in this repo or elsewhere?

https://github.com/homuler/MediaPipeUnityPlugin/blob/2d2863ea740a6a5ad01854ea88ab5f48be2a36b6/Assets/MediaPipeUnity/Samples/Scenes/Tasks/Face%20Detection/FaceDetectorRunner.cs

Note that ObjectDetector is not ported yet.

Also, any pointers on how to execute the graph synchronously would be helpful too!

See https://github.com/homuler/MediaPipeUnityPlugin/wiki/Getting-Started#get-imageframe. Alternatively, if you have ample memory, you can keep the image for reference and search for an image with the same timestamp as the output timestamp.

dogadogan commented 12 months ago

Great, thanks @homuler! Any idea when ObjectDetector would be ported for Task API?

Thinking about it, I feel like the synchronous approach might cause problems for real-time ARCore. Have you tried this with AR solutions in the past?

Your idea of storing the last few frames in an array for reference, and later searching for the one with the same timestamp as the output timestamp in the OutputCallback makes sense! However, do you know how I could access the originating timestamp from within OutputCallback (so I can use it for comparison across the captured frames stored in the array)?

dogadogan commented 11 months ago

Hi @homuler! :) I would appreciate if you could share any insights you might have about the comment above ^^ Thank you for your help and time!!

homuler commented 11 months ago

Any idea when ObjectDetector would be ported for Task API?

Maybe when I feel motivated. If not pressured by others, it might be achievable by around next month.

However, do you know how I could access the originating timestamp from within OutputCallback

Ah, it's designed not to pass the timestamp to the callback, so you'll need to make modifications to the following lines. https://github.com/homuler/MediaPipeUnityPlugin/blob/2d2863ea740a6a5ad01854ea88ab5f48be2a36b6/Packages/com.github.homuler.mediapipe/Runtime/Scripts/Unity/OutputStream.cs#L400

P.S. Responding to closed issues is challenging, so if necessary, please create a new issue.

homuler / MediaPipeUnityPlugin

Use ARCamera as input for MediaPipe? #343