uricamic / clandmark

Open Source Landmarking Library
http://cmp.felk.cvut.cz/~uricamic/clandmark
GNU General Public License v3.0
199 stars 111 forks source link

Face Rectangle size #76

Open mousomer opened 7 years ago

mousomer commented 7 years ago

I am trying to evaluate the clandmark models with different face detectors. They have different face scaling. So, for example, one may detect a face at [200 100 60 60] pixels, and the other at [190 90 80 80]. Should it make a difference which face rectangle I send over to the JOINTMV detectors? Should I retrain the models for different face detectors?

uricamic commented 7 years ago

Hi @mousomer,

the input image is rescaled internally to a fixed resolution, so called normalized frame. So, if the face is smaller than this (you can check what the precise size is in the .xml models, there are <bw_width> and <bw_height> tags defining it), the detection will be "more precise", and when the face is bigger, there is a systematic error introduced by scaling the image down to this fixed resolution.

In short, if the images are bigger, the retraining with a bigger normalized frame would increase the precision. However, the bigger the normalized frame is, the slower the detection (and therefore also training of the model) would be.

In case of any further question, please do not hesitate to ask them either here or on email.

mousomer commented 7 years ago

I see. thanks. From reviewing the code I had the impression that the NormalizedFrame was constant size (per model type). Was I wrong?

uricamic commented 7 years ago

Hi @mousomer,

yes, it is constant size per model type. But the input image is always rescaled to this size. So it can detect landmarks on "arbitrary" sized faces, however the detection precision is beside others also influenced by the normalized frame size.

mousomer commented 7 years ago

So there is an optimal face size per model?

uricamic commented 7 years ago

Yep, we could call the faces which are of the same size (or smaller) that the model's normalized frame optimal. Because there is no precision loss due to the downscaling.

However, it is definitely not necessary to have very huge normalized frames. Look for example on the results of CLandmark in the 300-W and 300-VW challenges, where the face size per example was very big. Our solution C2F-DPM used normalized frame of 80 x 80 px for the coarse detector and 160 x 160 px for the fine one.

mousomer commented 7 years ago

Thanks. Well, the problem I'm having is with the joint MV models (profiles and half-profiles). Suppose the vertical distance eyes-to-mouth is 100 pixels. What box should I send over to detect_optimized?

uricamic commented 7 years ago

I would go first for the detected face size, check the results and only if they were not satisfactory enough, I would start thinking about re-training the model.

The learning scripts for the jointmv model are very time demanding. I have some unpublished improvements which reduce the time from 2 weeks to 2 days for the current model. But those will require some time before being published. And both variants are quite heavy on memory requirements (around 20GB RAM is needed).

mousomer commented 7 years ago

Ah, but I'm trying to work with 3-rd party detectors. I guess I could run the openCV cascade first and gather statistics from there.

uricamic commented 7 years ago

Yeah, I haven't tried OpenCV cascades for profiles yet myself, but it should be surely possible.

mousomer commented 7 years ago

That's not what you're using for [pre-model] detection? (I was assuming that's the right thing do to because that's what you use in the static_input.cpp example).

uricamic commented 7 years ago

Nope, I was using the commercial face detector (http://www.eyedea.cz/) for the development of the landmark detector. It provides square face sizes for arbitrary yaw angle oriented faces.