Open mousomer opened 7 years ago
Hi @mousomer,
the input image is rescaled internally to a fixed resolution, so called normalized frame
. So, if the face is smaller than this (you can check what the precise size is in the .xml
models, there are <bw_width>
and <bw_height>
tags defining it), the detection will be "more precise", and when the face is bigger, there is a systematic error introduced by scaling the image down to this fixed resolution.
In short, if the images are bigger, the retraining with a bigger normalized frame
would increase the precision. However, the bigger the normalized frame
is, the slower the detection (and therefore also training of the model) would be.
In case of any further question, please do not hesitate to ask them either here or on email.
I see. thanks. From reviewing the code I had the impression that the NormalizedFrame was constant size (per model type). Was I wrong?
Hi @mousomer,
yes, it is constant size per model type. But the input image is always rescaled to this size. So it can detect landmarks on "arbitrary" sized faces, however the detection precision is beside others also influenced by the normalized frame
size.
So there is an optimal face size per model?
Yep, we could call the faces which are of the same size (or smaller) that the model's normalized frame
optimal. Because there is no precision loss due to the downscaling.
However, it is definitely not necessary to have very huge normalized frames
. Look for example on the results of CLandmark in the 300-W
and 300-VW
challenges, where the face size per example was very big. Our solution C2F-DPM
used normalized frame
of 80 x 80 px
for the coarse
detector and 160 x 160 px
for the fine
one.
Thanks. Well, the problem I'm having is with the joint MV models (profiles and half-profiles). Suppose the vertical distance eyes-to-mouth is 100 pixels. What box should I send over to detect_optimized?
I would go first for the detected face size, check the results and only if they were not satisfactory enough, I would start thinking about re-training the model.
The learning scripts for the jointmv model are very time demanding. I have some unpublished improvements which reduce the time from 2 weeks to 2 days for the current model. But those will require some time before being published. And both variants are quite heavy on memory requirements (around 20GB RAM is needed).
Ah, but I'm trying to work with 3-rd party detectors. I guess I could run the openCV cascade first and gather statistics from there.
Yeah, I haven't tried OpenCV cascades for profiles yet myself, but it should be surely possible.
That's not what you're using for [pre-model] detection? (I was assuming that's the right thing do to because that's what you use in the static_input.cpp example).
Nope, I was using the commercial face detector (http://www.eyedea.cz/) for the development of the landmark detector. It provides square face sizes for arbitrary yaw angle oriented faces.
I am trying to evaluate the clandmark models with different face detectors. They have different face scaling. So, for example, one may detect a face at [200 100 60 60] pixels, and the other at [190 90 80 80]. Should it make a difference which face rectangle I send over to the JOINTMV detectors? Should I retrain the models for different face detectors?