Closed alexandernst closed 1 year ago
i've tried it and really don't like that model - its basically a simplified version of blazeface
hard-coded for single fixed stride size. its about 5% faster, but has very little ability to adjust to different face sizes in input.
if all you need is 6 landmarks, just use blazeface
as-is (default face detetor in human
) and disable facemesh
.
if facemesh
is disabled, human
will automatically return landmarks from blazeface
But blazeface won't return the same landmarks, right? I made some quick tests, and it will get quite some landmarks. Is there a way to map / match the 6 points returned from Mediapipe's model to blazeface's?
its the same 6 points as its pretty much the same model just simplified to work as fast as possible with a single size of the face
if you're getting more you likely didnt disable mesh model.
lets confirm - using config to disable everything related to face and leave just detector (which is blazeface by default):
config.face: { enabled: true, mesh: { enabled: false }, attention: { enabled: false }, iris: { enabled: false }, description: { enabled: false }, emotion: { enabled: false } },
and then checking results:
console.log(result.face[0].annotations);
{
leftEye: [ [ 101.13767498731613, 69.1371038556099 ] ],
rightEye: [ [ 189.12629091739655, 69.66689601540565 ] ],
nose: [ [ 136.84045220911503, 123.79820179194212 ] ],
mouth: [ [ 139.43591246008873, 161.13025695085526 ] ],
leftEar: [ [ 63.44403076171875, 71.0589550435543 ] ],
rightEar: [ [ 243.3701298236847, 83.27123895287514 ] ],
}
but...i just found a bug where results will not be returnerd under some circumstances, i'll do a push in couple of minutes
its the same 6 points as its pretty much the same model just simplified to work as fast as possible with a single size of the face
if you're getting more you likely didnt disable mesh model.
lets confirm - using config to disable everything related to face and leave just detector (which is blazeface by default):
config.face: { enabled: true, mesh: { enabled: false }, attention: { enabled: false }, iris: { enabled: false }, description: { enabled: false }, emotion: { enabled: false } },
and then checking results:
console.log(result.face[0].annotations);
{ leftEye: [ [ 101.13767498731613, 69.1371038556099 ] ], rightEye: [ [ 189.12629091739655, 69.66689601540565 ] ], nose: [ [ 136.84045220911503, 123.79820179194212 ] ], mouth: [ [ 139.43591246008873, 161.13025695085526 ] ], leftEar: [ [ 63.44403076171875, 71.0589550435543 ] ], rightEar: [ [ 243.3701298236847, 83.27123895287514 ] ], }
but...i just found a bug where results will not be returnerd under some circumstances, i'll do a push in couple of minutes
human 3.0.3 is published
Hi again! I wanted to double-check everything before replying.
I made some tests and I'm very confident that the blaze model has major flaws.
In order to conduct my tests, I used the following configuration with the main demo:
let userConfig = {
backend: 'wasm',
face: {
enabled: true,
mesh: { enabled: false },
iris: { enabled: false },
description: { enabled: false },
emotion: { enabled: false },
},
filter: { enabled: false, flip: false },
object: { enabled: false },
gesture: { enabled: false },
hand: { enabled: false, maxDetected: 1, minConfidence: 0.5, detector: { modelPath: 'handtrack.json' } },
body: { enabled: false },
};
const drawOptions = {
bufferedOutput: true, // makes draw functions interpolate results between each detection for smoother movement
drawBoxes: true,
drawGaze: false,
drawLabels: false,
drawGestures: false,
drawPolygons: false,
drawPoints: true,
pointSize: 6,
fillPolygons: false,
useCurves: false,
useDepth: true,
};
Aside from the fact that the points don't match the position that they should have, the model itself doesn't seem to be stable, as in, the points will change their position sporadically:
https://user-images.githubusercontent.com/89727/212724000-59a1f743-ffe7-4ecc-9ef6-66850423b753.mov
Hence my initial request about converting mediapipe's model to TF.
Just to add some more info, I tested the blazeface demo itself and the (mis)position of the points happens there as well. Link to the demo: https://storage.googleapis.com/tfjs-models/demos/blazeface/index.html
i am more than open into looking whats wrong with blazeface model, but i will not include newer mediapipe model as i find it inferior - it cannot deal with different face sizes, its only food for face-in-front-of-webcam.
I'm also open to using blazeface model, if it were to work correctly. Please tell me if I can help debug anything further 🙏
i have something to start with, will update when i find a bit of time to work on it, its not a trivial one.
sorry this took a while, but i was busy with another project. anyhow, i've just pushed a major update on github that reimplements landmarks for blazeface.
human.next
did nothing for them, thus results were jumpy)human.draw
methods if mesh was disabledyes, blazeface is far from perfect as its supposed to be a lightweight face detector only before face is processed in mesh module and then blazeface results are pretty much discarded/replaced, but this makes blazeface at least viable if mesh is disabled.
Great news!! Let me test this and I'll report back :)
i'm closing this issue for now, so this part of the code can be marked as resolved.
i'm open to include any additional suggestions, so feel free to either continue on this thread or open a new issue.
Hi! I tested this, but unfortunately it's still not fixed. See attachment:
Landmarks aren't "jumping" anymore, but they are way off of the position they should be at.
hmm, i'll re-check new scaling. at least the other stuff seems to be working.
based on your example, seems there is an incorrect offset to +up+right, but general scale is ok.
Actually, "+up+right" doesn't seem to be altways the case. Try moving around and you'll see that the points offset error varies.
Examples:
I'll test it in the next few days. Pretty sure it's going to be annoying to nail down and then the fix will be one line.
updated, can you try?
Hi! I just tested it. The new demo seems to be working as expected :)
One final question, though. Is there a way I can get the left/right face edge, instead of the left/right ear landmarks?
Hi! I just tested it. The new demo seems to be working as expected :)
Good! Closing the issue as resolved, but feel free to keep posting on this thread.
Is there a way I can get the left/right face edge, instead of the left/right ear landmarks?
Closest I can think of would be to expose blazeface
decoded box results before post-processing:
https://github.com/vladmandic/human/blob/1bf65413fe5d3a437ea80247647b5e5510816aab/src/face/blazeface.ts#L101
But it would have to be surfaced in higher level modules as well to be user consumable. Not sure if I want to have another dataset exposed without strong use-case - what would that be?
Hi @vladmandic ! I'm sorry for respawning this once again, but I'm detecting performance issues (maybe in the model itself?).
Here is a comparison between your implementation of blazeface and mediapipe's demo. Check how the tracking of the landmarks is much slower in blazeface.
https://user-images.githubusercontent.com/89727/222159232-8ded2d6c-6d47-4e6e-975e-40526c650e18.mp4
-
And here is a recording of the facemesh demo, which is running smoothly as expected:
https://user-images.githubusercontent.com/89727/222159890-5d9fc89a-9900-404c-b418-b949bd718aa7.mp4
-
Can you confirm if this is an issue with the model itself or with the implementation?
i'm not seeing any major performance issues on my system, blazeface runs at constant 30fps. one difference in behavior if facemesh is disabled, the face caching gets disabled as well as there is insufficient data to make assumptions - so blazeface runs on each frame instead of being skipped. and slow tracking comes from interpolation - but it should still track much faster.
if you set config.async = false
(default is async execution and then performance stats are bundled for all models), you can dump result.performance
object to gather some stats - can you share them (with mesh enabled and disabled).
Is there a way to get (or convert?) Mediapipe's FaceDetect model, the one with the 6 landmarks? ( https://codepen.io/mediapipe/full/dyOzvZM )
I'm currently using this model and the information it provides is more than enough for my needs.