BlazePose documentation of z coordinates

shiffman commented 1 week ago

Hi everyone! I'm working on a video tutorial about BodyPose in ml5.js 1.0. I discovered in the making of the video some improvements I think we could make to the documentation of how the 3D coordinates work with BlazePose.

In the tfjs-models documentation the units for keypoints3D are explained as follows:

For the keypoints3D, x, y and z represent absolute distance in meters in a 2 x 2 x 2 meter cubic space. The range for each axis goes from -1 to 1 (therefore 2m total delta). The z is always perpendicular to the xy plane that passes the center of the hip, so the coordinate for the hip center is (0, 0, 0).

We should probably include a simplified version of this in our documentation here:

Screenshot 2024-10-07 at 4 35 08 PM

I was also confused to find that the 2D keypoints array (as described in our docs) also includes a z value. I don't believe this is part of the original BlazePose data. It looks like this is the code where it is being added but the units appear to be different. @ziyuan-linn, do you know offhand what is happening here? Is there some code I'm missing which is trying to change the real world "meters" range to pixel units?

This is what I see in the console:

keypoints: Screenshot 2024-10-07 at 4 38 26 PM

keypoints3D: Screenshot 2024-10-07 at 4 38 33 PM

And now under the nose property: Screenshot 2024-10-07 at 4 40 26 PM

(Interesting to note that the confidence score is different for keypoints3D!)

ziyuan-linn commented 1 week ago

I just ran BlazePose and here is a raw unprocessed output.

[
    {
        "score": 0.995323121547699,
        "keypoints": [
            {
                "x": 245.63711038340324,
                "y": 294.79695594356946,
                "z": -689535.245693554,
                "score": 0.9987707333798664,
                "name": "nose"
            },
            {
                "x": 270.6604500325918,
                "y": 252.8618703269841,
                "z": -639394.1690627289,
                "score": 0.9983254733819559,
                "name": "left_eye_inner"
            },
            // ...
        ],
        "keypoints3D": [
            {
                "x": 0.01237955486137293,
                "y": -0.587451014244643,
                "z": -0.2591571422454245,
                "score": 0.9982515152476616,
                "name": "nose"
            },
            {
                "x": 0.026530376499100554,
                "y": -0.6236724986839615,
                "z": -0.24141936558530744,
                "score": 0.9973494677709942,
                "name": "left_eye_inner"
            },
            // ...
        ]
    }
]

Looks like the z values are present for the 2d keypoints. I have no idea what it represents, and it also does not seem to be in the documentation. If everyone agrees, I think we can just take that value out.

I think the named keypoints just copy the x, y, z, and confidence values from the keypoints array. Should we also add the 3d values to the named keypoints?

shiffman commented 1 week ago

Thank you for looking into this @ziyuan-linn! Let's do the following:

Remove the z value in the keypoints array.

I was about to say let's add only the z value from keypoints3D to the named values, but it might make sense for us to provide the full xyz, what about the following:

nose: {
  x: 332.6024622758805,
  y: 265.78330263473146,
  confidence: 0.9993924452777454,
  keypoint3D: {
    x: 0.05988978072436527,
    y: -0.5489126977664187,
    z: -0.26418375968933105
  }
}

Or is this overdoing it and making it super complicated? @MOQN I'd be curious for your thoughts?

On another note, I'm trying to figure out why the tfjs docs list 4 extra points for BlazePose that don't actually show up in the model. @ziyuan-linn have you run across this in your research at all?

34: forehead
35: leftThumb
36: leftHand
37: rightThumb
38: rightHand \

ziyuan-linn commented 1 week ago

@shiffman I also have no idea what those keypoints are. The tfjs documentation can sometimes be puzzling. The BlazePose model is trained with 33 keypoints as the Google MediaPipe Model Card suggests. I think it should be safe to ignore them.

I will reply with my thoughts about the API in the ml5-next-gen thread.

ziyuan-linn commented 1 week ago

Actually just looking at the model card I think the 2d z value is supposed to be:

Z coordinate is measured in "image pixels" like the X and Y screen coordinates and represents the distance relative to the plane of the subject's hips, which is the origin of the Z axis. Negative values are between the hips and the camera; positive values are behind the hips. Z coordinate scale is similar with X, Y scales but has dierent nature as obtained not via human annotation, by ing synthetic data (GHUM model) to the 2D annotation. Note, that Z is not metric but up to scale.

However, a value like -437589 is nowhere near accurate. I think removing it for now might be the best choice.

MOQN commented 1 week ago

Thank you for all of these findings and thoughtful discussion. I completely agree with removing the Z value!

MOQN commented 1 week ago

nose: {
  x: 332.6024622758805,
  y: 265.78330263473146,
  confidence: 0.9993924452777454,
  keypoint3D: {
    x: 0.05988978072436527,
    y: -0.5489126977664187,
    z: -0.26418375968933105
  }
}
Or is this overdoing it and making it super complicated? @MOQN I'd be curious for your thoughts?

@shiffman, I believe it's a great suggestion. Very intuitive! (Edit: I wrote it too quicky suggesting 3d and didn't realize the key begins with a number, haha.) Alternatively we could use pos3D, position, position3D, coords, coords3D or depth instead of keypoint3D.

nose: {
  x: 332.6024622758805,
  y: 265.78330263473146,
  confidence: 0.9993924452777454,
  position: {
    x: 0.05988978072436527,
    y: -0.5489126977664187,
    z: -0.26418375968933105
  }
}

or

nose: {
  x: 332.6024622758805,
  y: 265.78330263473146,
  confidence: 0.9993924452777454,
  x3D: 0.05988978072436527,
  y3D: -0.5489126977664187,
  z3D: -0.26418375968933105
}

keypoints and keypoints3D can be used only for the array names to get the entire position data.

keypoints: [{ x, y, confidence, name }, ...], keypoints3D: [{ x, y, z, confidence, name }, ...],

[
  {
    box: { width, height, xMax, xMin, yMax, yMin },
    id: 1,
    keypoints: [{ x, y, confidence, name }, ...],
    keypoints3D: [{ x, y, z, confidence, name }, ...],
    left_ankle: { x, y, z, confidence },
    ...
    confidence: 0.28,
  },
  ...
];

shiffman commented 1 week ago

These are great suggestions! I'd love to hear everyone's feedback during the meeting today!

shiffman commented 1 week ago

Hello web team! Just noting this has now been incorporated so we can update the documentation! See https://github.com/ml5js/ml5-next-gen/pull/215

alanvww commented 5 days ago

@shiffman @MOQN @ziyuan-linn Hi team, @leey611 and I are currently working on the documentations to address this issue. While the named keypoints update is functioning well, we’ve discovered that the keypoints array is still including "wrong" z values. We haven’t addressed it in the ml5@1.1.0 update, have we?

shiffman commented 5 days ago

Ah i just checked and you are right! Let me make a quick fix for this and we can do a 1.1.1 release!

ml5js / ml5-website-v02-docsify

BlazePose documentation of z coordinates #181