patrikhuber / superviseddescent

C++11 implementation of the supervised descent optimisation method
http://patrikhuber.github.io/superviseddescent/
Apache License 2.0
403 stars 188 forks source link

reference to "Fitting 3D Morphable Models using Local Features" seems to be misleading #29

Closed mkutny closed 8 years ago

mkutny commented 8 years ago

Honestly, I'm not deep at research papers but it seems to me that the "Fitting 3D Morphable Models using Local Features" describes approach where landmarks are detected through 3DMM fitting: "In essence, the proposed method can also be seen as a 3D model based landmark detection, steering the 3D model parameters to converge to the ground truth location."

At the same time robust cascaded regression landmark detection is described in another paper: "Random Cascaded-Regression Copse for Robust Facial Landmark Detection".

patrikhuber commented 8 years ago

Hi!

This library is not only about landmark detection - as the title says, it's a A C++11 implementation of the supervised descent optimisation method, and it's quite generic. In fact, the examples/ directory contains first-run examples for approximating an arbitrary function, 3D pose estimation with a face model, and landmark detection. You are right that this code doesn't directly contain an implementation of the "Fitting 3D Morphable Models using Local Features" paper - however, this library is used as the foundation of the algorithm presented in the paper, it's the "heart" of the algorithm. So I would disagree that it's "misleading". However we are happy if you want to instead cite our other paper "Random Cascaded-Regression Copse for Robust Facial Landmark Detection". Also I'm open for suggestions if you still think it's misleading.

mkutny commented 8 years ago

My understanding that the 4dface logic is the following:

Now with regard to the references:

As I can see all the references miss landmark detection. And the problem with "Fitting 3DMM ..." is that it's written in a way as though landmark detection was not needed for pose estimation/shape fitting (suggesting quite opposite - that PE/SF could be used for LD).

Is my current understanding of the code's logic correct or I'm missing something?

patrikhuber commented 8 years ago

My understanding that the 4dface logic is the following: ...

Yes, that's correct!

As I can see all the references miss landmark detection.

In case of 4dface, this is kind of intentional. The landmark detection is not really what sets 4dface apart - the landmark detection is just a necessity. Actually, you can plug in any landmark detection, for example a commercial one, or dlib, and it might even run better. However we provide it with our RCR model (which is quite good too!), to offer a "complete package". We could add "Random Cascaded-Regression Copse for Robust Facial Landmark Detection" to the references list, that's true, (also on the superviseddescent page), I'll think about that - it's our paper too anyway ;-)

And the problem with "Fitting 3DMM ..." is that it's written in a way as though landmark detection was not needed for pose estimation/shape fitting (suggesting quite opposite - that PE/SF could be used for LD).

Aah! I think here indeed you might be misunderstanding that algorithm. In "Fitting 3DMM ...", we directly estimate the pose and shape parameters from local features (initialised with a face box). No landmarks are needed. In fact, of course, trivially, once we obtained the pose and shape parameters through the regressors, we can render the model and project whichever landmarks we want.

However as you are noticed, this algorithm is not directly used in 4dface (or eos) - rather, and that's what I mentioned earlier, eos and superviseddescent are used as the foundation of the "Fitting 3DMM ..." algorithm, which is why I think the reference is appropriate. I agree with you that for 4dface, we should probably put a better fitting reference, specific to 4dface - but we don't have that yet, so the other papers are kind of the "best fit", to give us credit for our work (which I think is again appropriate).

Feel free to ask if you hvae further questions or concern.

mkutny commented 8 years ago

The first time I read "Fitting 3DMM ..." I was sure it described algorithm to fit the face w/o landmarks. As SD was used in 4dface it set my expectations to be markless implementation. It appeared to be not. So I inferred "Fitting 3DMM ..." was misleading.

Finally it cleared up: SD had been prepared for "Fitting 3DMM ..." but another implementation (4dface) was based on SD instead.

Now I wonder why 4dface doesn't use SD as intended in "Fitting 3DMM"? Is it just a step towards "Fitting 3DMM" implementation?

patrikhuber commented 8 years ago

Finally it cleared up

Great! :-) So I think your biggest misconception was that SD is a particular algorithm of only one paper, when in fact it's an optimisation algorithm (like gradient descent) that can be used for many tasks - depending on how you set them up, for "traditional" landmark detection, or for "markerless" face fitting. (and for much more)

SD had been prepared for "Fitting 3DMM ..." but another implementation (4dface) was based on SD instead.

Well, more or less - the superviseddescent library was developed initially for "traditional" landmark detection, but yes, we always had an algorithm like in "Fitting 3DMM ..." in mind, so that's why we developed the framework in a generic way. But this point is not really important.

Now I wonder why 4dface doesn't use SD as intended in "Fitting 3DMM"?

One "problem" with the "direct" estimate in "Fitting 3DMM..." is that you need ground-truth camera and shape information, which you don't really have, or at least not a lot of, and not for "realistic" in-the-wild databases like ibug/LFPW/Helen etc. There is some ways around it and some interesting stuff I'd like to try but it's rather low priority and I never got to it. If you're interested in that, you can have a look at Stan Li's paper, who published the more or less same idea at a similar time. In any case, for the purpose of 4dface, it was much easier to just train a "traditional" landmark detection on 3000 ibug images with superviseddescent (like the RCR), and be done with it. It has the additional advantage that you can plug in any landmark detection, if you have a better one (for example commercial).

mkutny commented 8 years ago

Thanks for the reference, I'll definitely look at the paper!

patrikhuber commented 8 years ago

I'll close this issue. Feel free to re-open or open a new one if you have further questions.