multiple enhancements to body model

vladmandic commented 3 years ago

I tried to use the MediaPipe API in my project, but unfortunately it doesn't seem to support web worker (a must in my case, since there are some intensive 3D animations, and there is little room in the mani UI thread for other CPU-intensive task). So at the end I tried your @vladmandic human library instead, but I encountered some issues.

I loaded human.js inside worker (via importScripts), but I have to load the human object as new Human.default() instead of new Human() (I can import human.esm.js as module, but I want to avoid that in web worker).
.warmup() doesn't seem to work as worker doesn't have Image.
The accuracy of your PoseNet model is low, obviously lower than the default one from TFJS. Is it possible to change some parameters so that it is on par with the TFJS one, or even have the option to load ResNet instead of Mobilenet?

Originally posted by @ButzYung in https://github.com/vladmandic/human/discussions/47#discussioncomment-210604

vladmandic commented 3 years ago

I've converted this conversation to an issue as it's better fitted and I can track enhancements.

I loaded human.js inside worker (via importScripts), but I have to load the human object as new Human.default() instead of new Human() (I can import human.esm.js as module, but I want to avoid that in web worker).

Sure, that is ok.
Although I'm curious what are benefits you're seeing with loading via importScript instead of importing ESM module?

.warmup() doesn't seem to work as worker doesn't have Image.

Ahhh, the magic of missing items in workers hits again - I'll try to find an alternative.
Image is used only to load embedded JPEG data that is used for warmup as all browsers have a built-in decoder.
I could embed ImageData instead, but as that is uncompressed it would increase library size.
Or I could use a 3rd party JPEG parser, but I try to limit additional dependencies.
Will figure out something.

The accuracy of your PoseNet model is low, obviously lower than the default one from TFJS. Is it possible to change some parameters so that it is on par with the TFJS one

PoseNet model has only some optimizations that should not impact it's accuracy - did you try changing default parameters in config?

have the option to load ResNet instead of Mobilenet?

That should not be a problem. Only reason why I avoided it because it's too big to embed. I'll run some tests tomorrow.

ButzYung commented 3 years ago

Sure, that is ok. Although I'm curious what are benefits you're seeing with loading via importScript instead of importing ESM module?

I want to have the option to switch between human and the conventional TFJS (loaded via importScripts), as it is not possible to use both import and importScripts in worker at the same time. Maybe I can load TFJS via import as well, but module support in web worker is still fairly new (Chrome 80+, no Firefox), so browser support is a concern.

PoseNet model has only some optimizations that should not impact it's accuracy - did you try changing default parameters in config?

Yeah my config is customized. I have disabled all face-related models, leaving only body and hand. For body I only need to detect one person so I set maxDetections to 1. But that doesn't seem to be the cause of the problem. Even if I leave the body config untouched, the accuracy is still the same. In fact if I don't lower scoreThreshold to something below 0.5, most of the time the body is not detected at all. Even if it is detected this way, the scores of some body parts are low and the arms are "jumping" here and there (my app is focusing on uppper body detction, not the full body).

vladmandic commented 3 years ago

Maybe I can load TFJS via import as well, but module support in web worker is still fairly new (Chrome 80+, no Firefox), so browser support is a concern.

True. And it's even worse for mobile platform - Chrome still doesn't support modules there.
For my apps I prefer to use imports as usual to avoid unnecessary complications with importScript, but then create a bundle at the end and load that bundle instead. If you look at npm run dev, that is exactly what it does on each source file change.

In fact if I don't lower scoreThreshold to something below 0.5, most of the time the body is not detected at all. Even if it is detected this way, the scores of some body parts are low and the arms are "jumping" here and there

Strange - I'll investigate.

vladmandic commented 3 years ago

update: i've spend too much time trying to work with a broken tfjs 2.8.0 release, just downgraded back to tfjs 2.7.0 and re-implemented warmup() so it should work with web workers.

vladmandic commented 3 years ago

update: regarding your comment on scoreThreshold - note that body score is just average of scores for each keypoint. so if you're looking at just upper body and lower body is hidden, then average score is going to be low although score for upper body parts is high. if looking at upper body only, set scoreThreshold to low value such as 0.1 and check for each keypoint.score in your app instead.

vladmandic commented 3 years ago

update: i've implemented special case for single body detection (i didn't have that special case for body, only for hand), but i really don't like how it behaves - keypoints are accurate, but it cannot determine left vs right so every few frames points from left hand get switched to right hand and vice versa.

Difference is performance and not precision - single pose just uses argmax() to determine most likely keypoint out of each possible ones. multi pose actually traverses the tree to find most likely neighbor.

Try it out, but most likely I'd remove this.

vladmandic commented 3 years ago

update: Work on Body MobileNet model

I've switched default in Human to MobileNet 100% float16 instead of MobileNet 75% float32 and I like it much more.
I have to keep it small and performant, but you can try other variations. E.g., MobileNet with with 8 strides is so much slower than with 16 strides - which makes sense since it analyzes 4 times bigger matrix (each area that is analyzed is image size divided by stride vertically and horizontally).

You can try different MobileNet models like this:

const userConfig = {
  face: { enabled: false },
  body: { enabled: true, modelPath: 'https://storage.googleapis.com/tfjs-models/savedmodel/posenet/mobilenet/quant2/100/model-stride16.json', outputStride: 16 },
  hand: { enabled: false },
};
```js

where models are:


(and few other variations)

vladmandic commented 3 years ago

update: Work on Body ResNet model

Ok, this was a bit messier than I wanted since ResNet and MobileNet models return results in different order (?!), but finally Human is compatible with both.

You can enable ResNet models like this:

const userConfig = {
  face: { enabled: false },
  body: { enabled: true, modelType: 'ResNet', modelPath: 'https://storage.googleapis.com/tfjs-models/savedmodel/posenet/resnet50/quant2/model-stride16.json', outputStride: 16 },
  hand: { enabled: false },
};

Where some of the models are:

https://storage.googleapis.com/tfjs-models/savedmodel/posenet/resnet50/float/model-stride16.json https://storage.googleapis.com/tfjs-models/savedmodel/posenet/resnet50/float/model-stride32.json https://storage.googleapis.com/tfjs-models/savedmodel/posenet/resnet50/quant2/model-stride16.json https://storage.googleapis.com/tfjs-models/savedmodel/posenet/resnet50/quant2/model-stride32.json



I've tried float16 (quant2) at 16 strides and it's really preceise, but performance is less than 50% of MobileNet (not surprising).  
If you have good hardware, this is definitely the way to go.  

Of course, you can download model json and shards and load them from your local storage instead of Google storage.

vladmandic commented 3 years ago

update: I've figured out what is the difference in MediaPipe new compiled demo.

Anyhow, model returns point cloud for each keypoint (e.g. there are multiple possible points for left elbow). JS code finds one with highest score and that's it - but difference between them can be 0.0001% and switching back and forth, so result is "jumpy".

Compiled one finds average between all points with high scores (as they are all correct) - thus the result is much smoother output and confidence score is still high.

It would be doable to do in JS as well, but it's not high on my list right now.

Anyhow, that's all from me for now on this thread - major work and 6 updates.

ButzYung commented 3 years ago

Been testing the new version of body detection. Unfortunately it's still not that accurate and "jumpy". Changing models don't seem to help much. ResNet is better, but not much. What's even stranger is that when maxDetections is left as default or bigger than 1, sometimes it would detect more than 1 body (the max I noticed was 4), even though I am the only one sitting in front of the camera LOL. When maxDection is 1, behavior is somewhat different like you mentioned as it will mess up left and right hands sometimes (something I already noticed when using the TFJS version in the past, though it was less serious). But besides the hands problem, maxDections=1 still looks better than >1 (which is even more jumpy) in general.

I notice that the body detection of your live demo looks better than how I use it in my app in general. I have tried various config combo and can't really figure out the reason. Maybe it's because I am running my app on electron but not a native browser? Web worker (BTW your demo doesn't seem to work with web worker option ON)?

vladmandic commented 3 years ago

maxDections=1 still looks better than >1

Ok, I'll leave it in.

ResNet is better, but not much.

Strange, I find it much better up to a point of really good except for some jitter due to lack of smoothing (see my notes on keypoint cloud).
Btw, I did notice that body models in general are very sensitive to lighting conditions and tend to be quite jumpy in darker areas.

your demo doesn't seem to work with web worker option ON

I'll take a look. It was working, but I probably broke it unintentionally.

I notice that the body detection of your live demo looks better than how I use it in my app in general.

Electron vs browser shouldn't matter.
Which backend are you using? I believe there is a rounding issue in WASM backend at the moment which can cause some precision errors (I have several issues on that open with TFJS team), better try WebGL if that is an option.

ButzYung commented 3 years ago

Which backend are you using? I believe there is a rounding issue in WASM backend at the moment which can cause some precision errors (I have several issues on that open with TFJS team), better try WebGL if that is an option.

WebGL. Checking the console but couldn't see anything wrong in the config or any config difference between your demo and my app. The only "difference" is the reports on tf flags, in which your demo seems to show more details, but for properties that exist on both sides, they return the same value.

On a side note, I can have both human (without body, just hand detection) and TFJS PoseNet (loaded in the conventional way) running at the same time. Yeah it's clumsy, but at least it works lol

vladmandic commented 3 years ago

Can't explain the difference since I haven't seen your app.

Regarding Human demo and web workers - I just tried it in Chrome & Edge and it works. Even better if you enable buffered output as then UI refresh is completely detached from processing.

Which browser? Firefox is missing several features and I've decided not to support it for web workers currently. It would be possible, but a major pain - I'd rather wait for Firefox team to finally implement things like offscreenCanvas.

Update: Ahhh, gitpages that hosts live demo resolves relative paths differently than my local environment, so worker.js was not even loading (error 404).

vladmandic commented 3 years ago

i'm closing this issue as there is a lot of things worked on here, but feel free to open a new one to track further work.

vladmandic commented 3 years ago

FYI, I've managed to successfully convert and implement MediaPipe's BlazePose model as alternative to PoseNet.
PoseNet is still the default, but can be switched via configuration options. If interested, check out model notes as it's performance is quite difference.

vladmandic / human

multiple enhancements to body model #48