Low Precision of FaceDetection model when used without FaceMesh

alexandernst commented 1 year ago

Is there a way to get (or convert?) Mediapipe's FaceDetect model, the one with the 6 landmarks? ( https://codepen.io/mediapipe/full/dyOzvZM )

I'm currently using this model and the information it provides is more than enough for my needs.

vladmandic commented 1 year ago

i've tried it and really don't like that model - its basically a simplified version of blazeface hard-coded for single fixed stride size. its about 5% faster, but has very little ability to adjust to different face sizes in input.

if all you need is 6 landmarks, just use blazeface as-is (default face detetor in human) and disable facemesh.
if facemeshis disabled, human will automatically return landmarks from blazeface

alexandernst commented 1 year ago

But blazeface won't return the same landmarks, right? I made some quick tests, and it will get quite some landmarks. Is there a way to map / match the 6 points returned from Mediapipe's model to blazeface's?

vladmandic commented 1 year ago

its the same 6 points as its pretty much the same model just simplified to work as fast as possible with a single size of the face

if you're getting more you likely didnt disable mesh model.

lets confirm - using config to disable everything related to face and leave just detector (which is blazeface by default):

config.face: { enabled: true, mesh: { enabled: false }, attention: { enabled: false }, iris: { enabled: false }, description: { enabled: false }, emotion: { enabled: false } },

and then checking results:

console.log(result.face[0].annotations);

{
  leftEye: [ [ 101.13767498731613, 69.1371038556099 ] ],
  rightEye: [ [ 189.12629091739655, 69.66689601540565 ] ],
  nose: [ [ 136.84045220911503, 123.79820179194212 ] ],
  mouth: [ [ 139.43591246008873, 161.13025695085526 ] ],
  leftEar: [ [ 63.44403076171875, 71.0589550435543 ] ],
  rightEar: [ [ 243.3701298236847, 83.27123895287514 ] ],
}

but...i just found a bug where results will not be returnerd under some circumstances, i'll do a push in couple of minutes

vladmandic commented 1 year ago

its the same 6 points as its pretty much the same model just simplified to work as fast as possible with a single size of the face

if you're getting more you likely didnt disable mesh model.

lets confirm - using config to disable everything related to face and leave just detector (which is blazeface by default):

config.face: { enabled: true, mesh: { enabled: false }, attention: { enabled: false }, iris: { enabled: false }, description: { enabled: false }, emotion: { enabled: false } },

and then checking results:

console.log(result.face[0].annotations);
{
  leftEye: [ [ 101.13767498731613, 69.1371038556099 ] ],
  rightEye: [ [ 189.12629091739655, 69.66689601540565 ] ],
  nose: [ [ 136.84045220911503, 123.79820179194212 ] ],
  mouth: [ [ 139.43591246008873, 161.13025695085526 ] ],
  leftEar: [ [ 63.44403076171875, 71.0589550435543 ] ],
  rightEar: [ [ 243.3701298236847, 83.27123895287514 ] ],
}
but...i just found a bug where results will not be returnerd under some circumstances, i'll do a push in couple of minutes

human 3.0.3 is published

alexandernst commented 1 year ago

Hi again! I wanted to double-check everything before replying.

I made some tests and I'm very confident that the blaze model has major flaws.

In order to conduct my tests, I used the following configuration with the main demo:

let userConfig = {
  backend: 'wasm',
  face: {
    enabled: true,
    mesh: { enabled: false },
    iris: { enabled: false },
    description: { enabled: false },
    emotion: { enabled: false },
  },
  filter: { enabled: false, flip: false },
  object: { enabled: false },
  gesture: { enabled: false },
  hand: { enabled: false, maxDetected: 1, minConfidence: 0.5, detector: { modelPath: 'handtrack.json' } },
  body: { enabled: false },
};

const drawOptions = {
  bufferedOutput: true, // makes draw functions interpolate results between each detection for smoother movement
  drawBoxes: true,
  drawGaze: false,
  drawLabels: false,
  drawGestures: false,
  drawPolygons: false,
  drawPoints: true,
  pointSize: 6,
  fillPolygons: false,
  useCurves: false,
  useDepth: true,
};

Aside from the fact that the points don't match the position that they should have, the model itself doesn't seem to be stable, as in, the points will change their position sporadically:

https://user-images.githubusercontent.com/89727/212724000-59a1f743-ffe7-4ecc-9ef6-66850423b753.mov

Hence my initial request about converting mediapipe's model to TF.

alexandernst commented 1 year ago

Just to add some more info, I tested the blazeface demo itself and the (mis)position of the points happens there as well. Link to the demo: https://storage.googleapis.com/tfjs-models/demos/blazeface/index.html

vladmandic commented 1 year ago

i am more than open into looking whats wrong with blazeface model, but i will not include newer mediapipe model as i find it inferior - it cannot deal with different face sizes, its only food for face-in-front-of-webcam.

alexandernst commented 1 year ago

I'm also open to using blazeface model, if it were to work correctly. Please tell me if I can help debug anything further 🙏

vladmandic commented 1 year ago

i have something to start with, will update when i find a bit of time to work on it, its not a trivial one.

vladmandic commented 1 year ago

sorry this took a while, but i was busy with another project. anyhow, i've just pushed a major update on github that reimplements landmarks for blazeface.

blazeface annotations scaling math was wrong (most visible on left/right ear, but that's just the symptom)
blazeface annotations were not part of interpolation (so human.next did nothing for them, thus results were jumpy)
blazeface annotations were not drawn in human.draw methods if mesh was disabled

yes, blazeface is far from perfect as its supposed to be a lightweight face detector only before face is processed in mesh module and then blazeface results are pretty much discarded/replaced, but this makes blazeface at least viable if mesh is disabled.

alexandernst commented 1 year ago

Great news!! Let me test this and I'll report back :)

vladmandic commented 1 year ago

i'm closing this issue for now, so this part of the code can be marked as resolved.
i'm open to include any additional suggestions, so feel free to either continue on this thread or open a new issue.

alexandernst commented 1 year ago

Hi! I tested this, but unfortunately it's still not fixed. See attachment:

Landmarks aren't "jumping" anymore, but they are way off of the position they should be at.

vladmandic commented 1 year ago

hmm, i'll re-check new scaling. at least the other stuff seems to be working.
based on your example, seems there is an incorrect offset to +up+right, but general scale is ok.

alexandernst commented 1 year ago

Actually, "+up+right" doesn't seem to be altways the case. Try moving around and you'll see that the points offset error varies.

Examples:

vladmandic commented 1 year ago

I'll test it in the next few days. Pretty sure it's going to be annoying to nail down and then the fix will be one line.

vladmandic commented 1 year ago

updated, can you try?

alexandernst commented 1 year ago

Hi! I just tested it. The new demo seems to be working as expected :)

One final question, though. Is there a way I can get the left/right face edge, instead of the left/right ear landmarks?

vladmandic commented 1 year ago

Hi! I just tested it. The new demo seems to be working as expected :)

Good! Closing the issue as resolved, but feel free to keep posting on this thread.

Is there a way I can get the left/right face edge, instead of the left/right ear landmarks?

Closest I can think of would be to expose blazeface decoded box results before post-processing: https://github.com/vladmandic/human/blob/1bf65413fe5d3a437ea80247647b5e5510816aab/src/face/blazeface.ts#L101

But it would have to be surfaced in higher level modules as well to be user consumable. Not sure if I want to have another dataset exposed without strong use-case - what would that be?

alexandernst commented 1 year ago

Hi @vladmandic ! I'm sorry for respawning this once again, but I'm detecting performance issues (maybe in the model itself?).

Here is a comparison between your implementation of blazeface and mediapipe's demo. Check how the tracking of the landmarks is much slower in blazeface.

https://user-images.githubusercontent.com/89727/222159232-8ded2d6c-6d47-4e6e-975e-40526c650e18.mp4

-

And here is a recording of the facemesh demo, which is running smoothly as expected:

https://user-images.githubusercontent.com/89727/222159890-5d9fc89a-9900-404c-b418-b949bd718aa7.mp4

-

Can you confirm if this is an issue with the model itself or with the implementation?

vladmandic commented 1 year ago

i'm not seeing any major performance issues on my system, blazeface runs at constant 30fps. one difference in behavior if facemesh is disabled, the face caching gets disabled as well as there is insufficient data to make assumptions - so blazeface runs on each frame instead of being skipped. and slow tracking comes from interpolation - but it should still track much faster.

if you set config.async = false (default is async execution and then performance stats are bundled for all models), you can dump result.performance object to gather some stats - can you share them (with mesh enabled and disabled).

vladmandic / human

Low Precision of FaceDetection model when used without FaceMesh #330