Closed orpgol closed 4 years ago
I use such approach in my apps, just didn't see the need here as each model is hand picked and further optimized for use in human
.
If you have a specific model in mind, I'm listening...
We wanted to compare the emotion detection model you have here with the one available in face-api.
those are very different
face-api
loads weights and performs most processing in library itself, it's not even a full model on its own
plus each model has very different output definitions that require different processing - so loading a different model can't work if there is no code to parse it's output
best is to run human
and face-api
on same set of test images so compare is on identical ground - that is something on my to-do list (not just for emotion, but for age and overall performance)
yep that makes sense. thanks!
i've spent significant time working on face-api
:
to make it work with newer tfjs
, fix broken models and rebuild with modern typescript
at the end, i've created a drastically updated fork: https://github.com/vladmandic/face-api
but the models face-api
uses are not maintained for a long time, so any further improvement is impossible.
so i've started on a brand new project instead - that's how human
started
and yes, it's still in pre-release and ongoing a lot of daily tweaks
Yeah this is why we want to use human on our side as well. One thing we noticed is that in human's emotion detection model, the neutral face is detected less accurately than face-api's. I was wondering if you have any idea why that is before we retrain (they were trained on the same dataset as far as I can tell). In other words: could it be related to BlazeFace face detection?
no, i haven't looked at emotion model yet in details
i just finished tuning last week:
i still need to go over ag, gender and emotion
re: blazeface - i doubt it's related to blazeface as bounding boxes it produces seem ok after tuning
more likely it's the quality of model itself. it's always a battle between size and quality
if you know of good and small models, let me know
I am new to ai and training models but @vladmandic mentioned that there are other models that could be used except that they are bigger in size. What models are available and open source? Would it be easy to replace the existing models with larger models if I don't care about code size but would rather have better accuracy?
Thanks.
in general, classification models are easy to replace (age, gender, emotion) as their input and output parameters are simple
other models require special handling as their input/output requires special processing
but also note that age/gender/emotion classification models depend on pre-determined boundary box for face, so the quality of their output will depends on how good of a fit does boundary box detection does in the first place - that is what i'm analyzing right now
Thanks! Are there other larger models available that would improve hand body face pose significantly?
hand, body and face - i'm pretty much locked in on models by now although there is some tuning still left to do.
age, gender, emotion - i'm open for suggestions. current ones are choosen for performance and size.
and yes, performance is a deciding factor. model size doesn't have to mean it's more complex and thus less performant, but it's typically good indicator - model with large number of layers and operations has no chance in executing in required time
and i really want to keep human
capable of executing in near-real time (getting as close to 30fps as possible)
i did some research beforehand, but you can look around.
I understand and thank you.
i just published an update on git with internal tunables
first, i just added two more variations of emotion model
you can choose one in config.js
from emotion-mini, emotion-large and emotion-connect
still no idea which one is the best - could use help in evaluation
second, age & gender model has two variations, one trained on imdb and one on wiki dataset
and you can choose which one is active in config.js
third i just realized that emotion model prefers inputs in range -1..1 instad of 0..1,
so changing that increased it's accuracy - and details like that are nowhere documented
const normalize = tf.tidy(() => grayscale.sub(0.5).mul(2));
fourth, i still don't know what are the ideal face crop factors, grayscale conversion or score scale for it
if you want to play and tune the model, variables are exposed in src/emotion/emotion.js
const zoom = [0, 0]; // 0..1 meaning 0%..100%
const rgb = [0.2989, 0.5870, 0.1140]; // factors for red/green/blue colors when converting to grayscale
const scale = 1; // score multiplication factor
equally, to tune age & gender, edit src/ssrnet/ssrnet.js
const zoom = [0, 0]; // 0..1 meaning 0%..100%
you need to rebuild the model after changing them using npm run rebuild
, that only takes 1-2sec
and when testing, best to set minConfidence
to a low value to see all outputs
also, age/gender/emotion are subject to skipFrames
(really no point of evaluating them all the time), so unless you set it to 0, any detection result will be a little bit delayed.
OK so here is a quick eval of emotion-mini, emotion-large and emotion-connect. I used my face on webcam, comparing both front and back BlazeFace face detection with 0 skipped frames. First of all, fixing the range the emotion-large on BlazeFace front is now behaving like face-api's model from my comparison. Between the 3 models, emotion-connect is the worst in terms of accuracy on both front and back. Mini and Large are similar in terms of FPS, with Large perhaps with a slight extra 1FPS on my machine, but nothing noticeable. Large seems to have slightly better accuracy when detecting emotions, but mini detects neutral face better.
I'll try and tune the models and report back.
thanks, that's useful
no need to switch between front and back blazeface - it either works or it doesn't, it only returns a box coordinates of a face
front is designed for faces that take 70%+ of the screen area, like a video chat on the phone. for everything else, back is better
i'm more curious on the impact of cropping the face to narrow/wide and tall/short box and that impact on the age/gender/emotion (that's why zoom
has two params - crop by percentage on x-axis and y-axis).
on a side-note, cnn is not connect, it's an old-school convoluted-neural-network while other two are based on inception network, difference between small and large being number of training passes to reinforce learned data
i'm closing this issue for now.
any further model optimizations will be handled in dedicated threads.
Hi, I was wondering if there is any reason not to support "tf.loadLayersModel" which will open the possibility of using more pretrained models that are available.
Would it makes sense to add a condition and a definition of Layers vs GraphModel to the config file?