ml5js / ml5-next-gen

Repo for next generation of ml5.js: friendly machine learning for the web! 🤖
57 stars 19 forks source link

Output API for hand #25

Open ziyuan-linn opened 1 year ago

ziyuan-linn commented 1 year ago

Hi everyone, I'm opening this thread to discuss about the model prediction output for hand detection. Though I think a lot of things here can also be applied to other landmark detection models.


The tf.js original output for hand detection looks like this:

    score: 0.8,
    handedness: "Right",
    keypoints: [
      {x: 105, y: 107, name: "wrist"},
      {x: 108, y: 160, name: "pinky_finger_tip"},
    keypoints3D: [
      {x: 0.00388, y: -0.0205, z: 0.0217, name: "wrist"},
      {x: -0.025138, y: -0.0255, z: -0.0051, name: "pinky_finger_tip"},
    score: 0.9,
    handedness: "Left",

One idea is to expose each keypoint by name so they can be more intuitively accessed, for example:

    score: 0.8,
    handedness: "Right",
    wrist: { //<-----------------------add
      x: 105,
      y: 107,
      3dx: 0.00388,
      3dy: -0.0205,
      3dz: 0.0217
    pinky_finger_tip: { //<-----------------------add
      x: 108,
      y: 160,
      3dx: -0.025138,
      3dy: -0.0255,
      3dz: -0.0051
    keypoints: [
      {x: 105, y: 107, name: "wrist"},
      {x: 108, y: 160, name: "pinky_finger_tip"},
    keypoints3D: [
      {x: 0.00388, y: -0.0205, z: 0.0217, name: "wrist"},
      {x: -0.025138, y: -0.0255, z: -0.0051, name: "pinky_finger_tip"},
    score: 0.9,
    handedness: "Left",

@yining1023 suggested grouping landmarks of each finger together with intuitive names like wrist, thumb, etc...

        "handedness": "Left",
        "wrist": { //<-----------------------add
            "x": 57.2648811340332,
            "y": 489.2754936218262,
            "z": 0.00608062744140625,
            "confidence": 0.89
        "thumb": [ //<-----------------------add
                "x": 57.2648811340332,
                "y": 489.2754936218262,
                "z": 0.00608062744140625,
                "confidence": 0.89,
        "indexFinger":[], //<-----------------------add
        "middleFinger":[], //<-----------------------add
        "ringFinger":[], //<-----------------------add
        "pinky":[], //<-----------------------add
        "keypoints": [
                "x": 57.2648811340332,
                "y": 489.2754936218262,
                "name": "wrist"
        "keypoints3D": [
                "x": -0.03214273601770401,
                "y": 0.08357296139001846,
                "z": 0.00608062744140625,
                "name": "wrist"
        "score": 0.9638671875,
       "handedness": "Right",


I think this feature could potentially be very useful for users. However, the handedness is the opposite of the actual hand (left hand labeled as right). I found that when flipHorizontal is set to true, the handedness would be labeled correctly. We could potentially flip the handedness value within ml5 when flipHorizontal is false.

Keypoint Diagram

Tf.js have a diagram outlining each index and the name of each keypoint. diagram

I personally find this kind of diagram very helpful when trying to find a landmark point quickly. I think there are similar diagrams for other tf.js landmark detection models. @MOQN Do you think we could display or link these diagrams on the new website?

I'm happy to hear any suggestions or ideas!

B2xx commented 1 year ago

Hi everyone, I think we met the same concerns for the model prediction output for face mesh, and I put my solution here (Feedback needed!).


The tf.js original output for face mesh looks like this:

    box: {
      xMin: 304.6476503248806,
      xMax: 502.5079975897382,
      yMin: 102.16298762367356,
      yMax: 349.035215984403,
      width: 197.86034726485758,
      height: 246.87222836072945
    keypoints: [
      {x: 406.53152857172876, y: 256.8054528661723, z: 10.2, name: "lips"},
      {x: 406.544237446397, y: 230.06933367750395, z: 8},

My idea for organizing the key points is to expose its centerX, centerY and width and height based on some basic calculation.

   featuresData {
     "leftEye": {
        "centerX": ,
        "centerY": ,
        "width": ,
     "rightEye": {
         "centerX": ,
          "centerY": ,
          "width": ,

This is my function for getting the data, and I'm wondering if we need to organize all the face features so that the user could use them directly or just leave an example?

//A function to store basic data for certain facial features, this example is for lips
//We have faceOval,rightEyebrow, leftEyebrow, rightEye, leftEye, lips
function featuresData(){
  if (predictions.length > 0){
    for (let i = 0; i < predictions.length; i += 1) {
      const face = predictions[i];
      const fKeypointX = [];
      const fKeypointY = [];
      for (let j = 0; j < face.keypoints.length; j += 1) {
        // console.log(Object.values(keypoint)[3]); //The name of all facial features
        const keypoint = face.keypoints[j];

        if (Object.values(keypoint)[3]=="lips") {
      //Create an example class of important data of facial features
          const featuresData = {
            lips: {
              centerX: avg(fKeypointX),
              centerY: avg(fKeypointY),
              fWidth: length(fKeypointX),
              fheight: length(fKeypointY),
          // console.log(featuresData);
    function avg(x){
      return (max(x)+min(x))/2
    function length(x){
      return max(x)-min(x)

Without Preset Nose

I found that the facemesh model do not has a preset nose area, do we need to have a preset nose?

Keypoint Diagram

The keypoint Diagram is really useful for me! I also add a function for users to get the index of the points closest to their mouse.


Here's my function to show the index of the points

//Show the index of the points
function directPoints(){
  let dMouse = [];
  let closest = 0;

  if (predictions.length > 0){
    for (let i = 0; i < predictions.length; i += 1) {
      const face = predictions[i];
      for (let j = 0; j < face.keypoints.length; j += 1) {
          const keypoint = face.keypoints[j];

          //calculate the distance between mouse and points
          let d = dist(keypoint.x,keypoint.y,mouseX,mouseY);


      let minimum = min(dMouse);
      closest = dMouse.indexOf(minimum);

      ellipse(predictions[i].keypoints[closest].x, predictions[i].keypoints[closest].y, 5, 5);



Feedback Needed!

shiffman commented 1 year ago

Hi @B2xx, if you take a look at @ziyuan-linn's latest in #35, this may help as a guide for the face keypoints!

One comment about your earlier post is that the featuresData property isn't a clear name for me. Does the API output an array of faces or just one face only? Regardless, I think any face object can include the "parts" directly along with a keypoints array. I'm imagining something like:

function gotFaces(faces) {
  // all faces
  // one face
  // / bounding box of face, not sure if x,y should be centered or top left?
  console.log(faces[0].x, faces[0].y, faces[0].width, faces[0].height); 

  // all keypoints
  // one keypoint
  console.log(faces[0].keypoints[0].x, faces[0].keypoints[0].y);

  // x,y of a part (should this be center or top left? 
  // should width and height also be included for part bounding box?)
  console.log(faces[0].mouth.x, faces[0].mouth.y); 
  // all of the part keypoints
  //  x,y of one part keypoint
  console.log(faces[0].mouth.keypoints[0].x, faces[0].mouth.keypoints[0].y); 

  // etc.
B2xx commented 1 year ago

Hi @shiffman, we have updated the output of facemesh model according to @ziyuan-linn's latest in #35, and its output looks like this now!

        "box": { //<-----------------------add
            "height": 115.38676768541336,
            "width": 93.99256706237793,
            "xMax": 249.73242282867432,
            "xMin": ...,
            "yMax": ...,
            "yMin": ...,
        "faceOval": [ //<-----------------------add
                "x": 202.27954387664795,
                "y": 50.33646672964096,
                "z": 2.1165020763874054,
        "keypoints": [
                "x": 201.72533988952637,
                "y": 122.80799746513367,
                "z": 13.084457814693451,
                "name": "lips"
        "leftEye":[], //<-----------------------add
        "leftEyebrow":[], //<-----------------------add
        "ringFinger":[], //<-----------------------add
        "lips":[], //<-----------------------add
        "rightEye":[], //<-----------------------add
        "rightEyebrow":[], //<-----------------------add

Besides, I have made a pull request of our newest facemesh-noeventestr to merge to the main, could you look into it?

Thank you @ziyuan-linn for helping us debug the output of facemesh!

lindapaiste commented 1 year ago

One other option is to return an object which has methods and not just raw data. Like we would define a HandPrediction class and return an instances of it.

Possible APIs:

prediction.getKeypoint('pinkyTip'); // returns x, y

prediction.getKeypoint3D('pinkyTip'); // returns x, y, z

prediction.getShape('ringFinger'); // returns an array of points?

prediction.getBoundingBox(); // returns the rectangle dimensions

prediction.getKeypoints(); // return the array of all x, y points

Let me know if you want my help with this.

shiffman commented 1 year ago

Hi @lindapaiste, thank you so much for following the continued development of this library! Your previous work and pull requests have been an invaluable resource as we look to reboot and release a "next generation" ml5.js!

I like this idea and see how it could help simplify things, especially for a face detection model which includes many parts, keypoints, etc. Returning a p5.Vector could also be very convenient (but then reduces compatibility outside of p5.js) Curious to hear from everyone else! cc @MOQN @gohai @ziyuan-linn @sproutleaf (and more!)

lindapaiste commented 1 year ago

Returning a p5.Vector could also be very convenient (but then reduces compatibility outside of p5.js)

We could potentially return a p5.Vector when p5 is loaded and a otherwise return a plain object with x, y, and z. The p5.Vector has properties x, y, and z so it would not be dramatically different between the two modes.

lindapaiste commented 1 year ago

Rough code based on the handpose data

function maybeVector(point) {
    if (p5Utils.checkP5()) {
        const p5 = p5Utils.p5Instance;
        return p5.createVector(point.x, point.y, point.z);
    } else return point;

class DetectedHand {

    constructor(data) { = data;

    getKeypoints() {

    getKeypoints3D() {

    _findKeypoint(array, partName) {
        const point = array.find(point => === partName);
        if (!point) {
            throw new Error(
                `No keypoint found with name ${partName}.\n
                Available names: ${ =>', ')}`
        return maybeVector(point);

    getKeypoint(partName) {
        return this._findKeypoint(, partName);

    getKeypoint3D(partName) {
        return this._findKeypoint(, partName);

    getShape(partName) {
        // may require a specific mapping of keypoints to parts
          .filter(point =>
shiffman commented 4 months ago

I think this is also in a settled place as we move towards 1.0 release and perhaps this should also be closed? The discussion here is of course welcome to continue, but I'm hesitant to make any major API changes before release!