Using Handy.js with computer camera and MediaPipe

iNat28 commented 3 years ago

Hi,

I'm currently working on a project that interprets ASL into text, using a computer camera and MediaPipe to get the hand model. I was wondering if there would be a way to use Handy.js for the hand model interpretation. Would there be a way for me to use Handy.js with the hand model from MediaPipe, instead of using the hand models from the Oculus Quest? (Potentially using the WebXR hand tracking API) If so, could you point me in the right direction on how I would be able to do that?

Thanks!

stewdio commented 3 years ago

👋 Hi, iNat28. I see your user account is brand new so welcome! (Though I suspect you’re more seasoned than your account metadata lets on 😉)

If you’re using MediaPipe to get 3D joint positions for a hand then yes, you are in luck. All Handy needs are the X, Y, Z positions for each hand joint as mapped here: https://github.com/immersive-web/webxr-hand-input/blob/master/explainer.md#appendix-proposed-idl

From there you’re good to go—no further machine learning required 👍

The part that might be tricky is replacing the Three.js WebXR hand tracking API / visualization bits with your own custom code for listening to MediaPipe data. The reason it’s a thorny process is because the code actually listening for data-change events is kind of buried within Three code. But the short answer is yes, you can definitely use hand joint data coming from elsewhere to drive Handy.

iNat28 commented 3 years ago

First off, thank you so much for your response!

Second, we had a follow-up question. While working with Handy.js, we noticed that the joint object was used frequently throughout the code, which has properties such as .position and .matrix. We weren't sure how to properly implement this joint object, and we wanted to know if it is possible to create our own instance of this joint object type. If this is not possible, would there be a way to create our own joint object that includes all the properties necessary for its use in Handy.js?

In addition, while looking at the preparePosition function in the readLiveShapeData function, it uses the 12th-14th elements from the 4x4 matrix of the inputted joint. We were curious to know why specifically those elements are used to represent the position of a given joint. Would you be able to explain to us why this is?

Thanks!

stewdio commented 3 years ago

You’re welcome!

The joint object itself is a THREE.Object3D which implements the position and matrix properties—and many more. (These Object3D instances are just ingesting the raw matrix data that the WebXR Hands API is providing. Also, if you’re seeing THREE.Group, instances; Group just extends Object3D. ) I’m not sure why you would need to re-implement these; are you trying to port this to a different 3D engine? If so your engine of choice should already have some equivalent structure that you can use instead.

The preparePosition function uses matrix multiplication to determine the true distance between the wrist joint’s world matrix and the matrix of whatever joint has been passed in as an argument. (For expediency readLiveShapeData only assesses finger tip joints.) By taking the Array elements 12, 13, and 14 of the resulting matrix we are directly extracting the X, Y, Z position data that is stored within the matrix without having to make any additional Three.js calls. See also THREE.Matrix4.prototype.decompose as an alternative means of extracting position from a Matrix4.

stewdio / handy.js

Using Handy.js with computer camera and MediaPipe #2