ksachdeva / dlib-to-tf-keras-converter

A set of scripts to convert dlib's face recognition network to tensorflow, keras, onnx etc
Apache License 2.0
38 stars 10 forks source link

Fine-tuning? #2

Open jrosebr1 opened 5 years ago

jrosebr1 commented 5 years ago

Really nice job, @ksachdeva! Congratulations on this implementation -- it's really nice!

I was curious about fine-tuning with Keras. For example, let's say we wanted to:

  1. Take the dlib model definition and trained weights
  2. Convert them to Keras
  3. And then use Keras to fine-tune a model on data the original model wasn't originally trained on

Have you experimented at all with that use case?

I'd really love to help with such a project so please do get back to me πŸ˜„

ksachdeva commented 5 years ago

Thanks Adrian for the kind words.

Indeed I have thought about fine tuning however there are few challenges and aspects that make the exercise (in this particular context) not so futile.

First and foremost, the weights of the batch normalization layers are not available for this network. Davis King in one of his forum posts mentions that he accidentally overwrote the training weights file and lost them. With out those lost parameters the fine tuning will be invalid.

When I ported the architecture to keras I kept the BN layer because I can train from scratch. Note that during inference BN layer is replaced by Scale (or also called Affine) layer.

Now for a moment consider that the weights were not lost. Even then fine tuning for this particular network would not reap lot of benefits.

Here is my humble reasoning for above argument -

Fine tuning makes sense if you are dealing with a larger network and retraining takes lot of time. Now in this case the network is relatively small ..only 29 layers. On one 1080Ti it takes me about 10 hours to train on about 3 million (aligned) images.

Now again let’s remove the small vs large network and training time from the equation. You may want to adjust (fine tune) the network for new images (new classes of human faces in this particular case).

Reason above approach does not excite me is that typically for face recognition you plug an SVM model on the face representations. In my experiments, SVM has worked really well and even often compensate for the strength/accuracy of this particular network.

I hope these arguments make sense, again this is based on my understanding and experiments ... I am not an authority of this subject so please free to challenge and suggest if you see it differently

Sent from my iPhone

On May 27, 2019, at 7:40 AM, Adrian Rosebrock notifications@github.com wrote:

Really nice job, @ksachdeva! Congratulations on this implementation -- it's really nice!

I was curious about fine-tuning with Keras. For example, let's say we wanted to:

Take the dlib model definition and trained weights Convert them to Keras And then use Keras to fine-tune a model on data the original model wasn't originally trained on Have you experimented at all with that use case?

I'd really love to help with such a project so please do get back to me πŸ˜„

β€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

jrosebr1 commented 5 years ago

For face recognition, yes, you could take the 128-d embeddings from the network and then train a Logistic Regression or SVM on top of those representations. That can and will work for a small number of new face identities to recognize.

To make the method more robust; however, one could fine-tune the model on a new dataset of example images. This new dataset would be smaller both in terms of (1) total images and (2) total number of unique individuals. It may also be impossible to train such a network from scratch using the dataset.

There is an "in-between" situation where the SVM/LR approach could be too noisy/too many incorrect labelings while training from scratch would be impossible. In those situations fine-tuning might be worth exploring (at least in my opinion).