microsoft / psi

Platform for Situated Intelligence
https://github.com/microsoft/psi/wiki
Other
543 stars 98 forks source link

Extracting HL2 Hand Pose Data from PSI Server and Sending to Python Over TCP #338

Open sakibreza opened 4 days ago

sakibreza commented 4 days ago

I have successfully used the PSI documentation to capture RGB frame data from the HoloLens 2 and send it to a Python server over TCP. The approach works well, where I convert the RGB frame data into bytes and transmit it.

Now, I’m attempting to do the same for hand pose data. Specifically, I am trying to extract hand joint coordinates from PSI and send them over TCP to my Python server, but I’m facing difficulty. I’m looking for an equivalent method to the following, which I used for extracting RGB frames:

imageViewer = new ImageViewer(captureServerPipeline, $"Image Viewer: {title}", false);
stream.PipeTo(imageViewer, DeliveryPolicy.LatestMessage);

var webcamBytes =
    stream
        .Sample(TimeSpan.FromMilliseconds(200))
        .EncodeJpeg(90, DeliveryPolicy.LatestMessage)
        .Select(jpg => jpg.Resource.GetBuffer());

var frameWriter = new NetMQWriter<byte[]>(
    captureServerPipeline,
    "frames",
    "tcp://127.0.0.1:30000",
    MessagePackFormat.Instance);
webcamBytes.PipeTo(frameWriter);

N.B.: To make the RGB frame capture work, I modified the code at L449 in HoloLensCaptureServer.cs , and it successfully started sending RGB frames to the server.

This works for capturing the RGB frames. However, I am unable to figure out how to extract the actual hand pose data (specifically the joint coordinates) from PSI. I’ve tried using the Microsoft.Psi.MixedReality.StereoKit.HandsSensor class, but I couldn’t find a way to access the joint coordinates.

Could you please guide me on how to extract the hand pose data from PSI, similar to how I am extracting RGB frames? Any pointers or code examples would be greatly appreciated.

Thank you!

sandrist commented 2 days ago

The HoleLensCaptureApp sends hand data as a stream of tuples: (Microsoft.Psi.MixedReality.OpenXR.Hand leftHand, Microsoft.Psi.MixedReality.OpenXR.Hand rightHand) with the name "Hands", as you can see on L532. The class for this type of hand is defined here: https://github.com/microsoft/psi/blob/master/Sources/MixedReality/Microsoft.Psi.MixedReality/OpenXR/Hand.cs

You could look at CaptureTcpStream<T> on the server side, which is the generic method for capturing all streams from the HoloLens. One of those streams will have the name "Hands" and T will be (Microsoft.Psi.MixedReality.OpenXR.Hand, Microsoft.Psi.MixedReality.OpenXR.Hand)

Once you have the stream, you can do whatever you want with it. For example, if you name the stream handsStream, you could do something like:

var leftHandJointPositions = handsStream.Select(tuple => tuple.Item1.Joints.Select(coordinateSystem => coordinateSystem.Origin).ToArray());

which would give you a stream of the left hand's joint positions as an array of Point3D (x,y,z).