vladmandic / human

Human: AI-powered 3D Face Detection & Rotation Tracking, Face Description & Recognition, Body Pose Tracking, 3D Hand & Finger Tracking, Iris Analysis, Age & Gender & Emotion Prediction, Gaze Tracking, Gesture Recognition
https://vladmandic.github.io/human/demo/index.html
MIT License
2.36k stars 323 forks source link

model tuning and evaluation #15

Closed orpgol closed 4 years ago

orpgol commented 4 years ago

Hi, I was wondering if there is any reason not to support "tf.loadLayersModel" which will open the possibility of using more pretrained models that are available.

Would it makes sense to add a condition and a definition of Layers vs GraphModel to the config file?

vladmandic commented 4 years ago

I use such approach in my apps, just didn't see the need here as each model is hand picked and further optimized for use in human.

If you have a specific model in mind, I'm listening...

orpgol commented 4 years ago

We wanted to compare the emotion detection model you have here with the one available in face-api.

vladmandic commented 4 years ago

those are very different

face-api loads weights and performs most processing in library itself, it's not even a full model on its own

plus each model has very different output definitions that require different processing - so loading a different model can't work if there is no code to parse it's output

best is to run human and face-api on same set of test images so compare is on identical ground - that is something on my to-do list (not just for emotion, but for age and overall performance)

orpgol commented 4 years ago

yep that makes sense. thanks!

vladmandic commented 4 years ago

i've spent significant time working on face-api: to make it work with newer tfjs, fix broken models and rebuild with modern typescript

at the end, i've created a drastically updated fork: https://github.com/vladmandic/face-api

but the models face-api uses are not maintained for a long time, so any further improvement is impossible.

so i've started on a brand new project instead - that's how human started
and yes, it's still in pre-release and ongoing a lot of daily tweaks

orpgol commented 4 years ago

Yeah this is why we want to use human on our side as well. One thing we noticed is that in human's emotion detection model, the neutral face is detected less accurately than face-api's. I was wondering if you have any idea why that is before we retrain (they were trained on the same dataset as far as I can tell). In other words: could it be related to BlazeFace face detection?

vladmandic commented 4 years ago

no, i haven't looked at emotion model yet in details
i just finished tuning last week:

i still need to go over ag, gender and emotion

re: blazeface - i doubt it's related to blazeface as bounding boxes it produces seem ok after tuning
more likely it's the quality of model itself. it's always a battle between size and quality
if you know of good and small models, let me know

delebash commented 4 years ago

I am new to ai and training models but @vladmandic mentioned that there are other models that could be used except that they are bigger in size. What models are available and open source? Would it be easy to replace the existing models with larger models if I don't care about code size but would rather have better accuracy?

Thanks.

vladmandic commented 4 years ago

in general, classification models are easy to replace (age, gender, emotion) as their input and output parameters are simple
other models require special handling as their input/output requires special processing

but also note that age/gender/emotion classification models depend on pre-determined boundary box for face, so the quality of their output will depends on how good of a fit does boundary box detection does in the first place - that is what i'm analyzing right now

delebash commented 4 years ago

Thanks! Are there other larger models available that would improve hand body face pose significantly?

vladmandic commented 4 years ago

hand, body and face - i'm pretty much locked in on models by now although there is some tuning still left to do.
age, gender, emotion - i'm open for suggestions. current ones are choosen for performance and size.

and yes, performance is a deciding factor. model size doesn't have to mean it's more complex and thus less performant, but it's typically good indicator - model with large number of layers and operations has no chance in executing in required time
and i really want to keep human capable of executing in near-real time (getting as close to 30fps as possible)

i did some research beforehand, but you can look around.

delebash commented 4 years ago

I understand and thank you.

vladmandic commented 4 years ago

i just published an update on git with internal tunables

first, i just added two more variations of emotion model
you can choose one in config.js from emotion-mini, emotion-large and emotion-connect
still no idea which one is the best - could use help in evaluation

second, age & gender model has two variations, one trained on imdb and one on wiki dataset
and you can choose which one is active in config.js

third i just realized that emotion model prefers inputs in range -1..1 instad of 0..1,
so changing that increased it's accuracy - and details like that are nowhere documented

  const normalize = tf.tidy(() => grayscale.sub(0.5).mul(2));

fourth, i still don't know what are the ideal face crop factors, grayscale conversion or score scale for it
if you want to play and tune the model, variables are exposed in src/emotion/emotion.js

  const zoom = [0, 0]; // 0..1 meaning 0%..100%
  const rgb = [0.2989, 0.5870, 0.1140]; // factors for red/green/blue colors when converting to grayscale
  const scale = 1; // score multiplication factor

equally, to tune age & gender, edit src/ssrnet/ssrnet.js

  const zoom = [0, 0]; // 0..1 meaning 0%..100%  

you need to rebuild the model after changing them using npm run rebuild, that only takes 1-2sec
and when testing, best to set minConfidence to a low value to see all outputs

also, age/gender/emotion are subject to skipFrames (really no point of evaluating them all the time), so unless you set it to 0, any detection result will be a little bit delayed.

orpgol commented 4 years ago

OK so here is a quick eval of emotion-mini, emotion-large and emotion-connect. I used my face on webcam, comparing both front and back BlazeFace face detection with 0 skipped frames. First of all, fixing the range the emotion-large on BlazeFace front is now behaving like face-api's model from my comparison. Between the 3 models, emotion-connect is the worst in terms of accuracy on both front and back. Mini and Large are similar in terms of FPS, with Large perhaps with a slight extra 1FPS on my machine, but nothing noticeable. Large seems to have slightly better accuracy when detecting emotions, but mini detects neutral face better.

I'll try and tune the models and report back.

vladmandic commented 4 years ago

thanks, that's useful

no need to switch between front and back blazeface - it either works or it doesn't, it only returns a box coordinates of a face
front is designed for faces that take 70%+ of the screen area, like a video chat on the phone. for everything else, back is better

i'm more curious on the impact of cropping the face to narrow/wide and tall/short box and that impact on the age/gender/emotion (that's why zoom has two params - crop by percentage on x-axis and y-axis).

on a side-note, cnn is not connect, it's an old-school convoluted-neural-network while other two are based on inception network, difference between small and large being number of training passes to reinforce learned data

vladmandic commented 4 years ago

i'm closing this issue for now.
any further model optimizations will be handled in dedicated threads.