yinguobing / cnn-facial-landmark

Training code for facial landmark detection based on deep convolutional neural network.
MIT License
623 stars 183 forks source link

Overfitting #96

Open Rutvik21 opened 4 years ago

Rutvik21 commented 4 years ago

Hey, sorry to disturb you again. I have started training again using your model as a reference. I am using WFLW dataset which has 98 landmarks. I have trained the model and after training the loss was around 0.002... But when I try to predict that using butterfly, the accuracy is less around 30%. So it's definitely overfitting, right? What should I do to avoid it? Currently, I am using the same model, you've provided.

And also, is there any pre-trained model available that can be used to predict landmarks like these.

Thank you.

yinguobing commented 4 years ago

It's hard to tell.

Generally, an over fitted model performs worse on validation dataset than the training dataset. You can observe this from the loss value during training.

There are many means to avoid overfitting: data argumentation, new network structure/layer, etc. For facial landmarks I would recommend starting from this new loss function: https://arxiv.org/abs/1711.06753

Rutvik21 commented 4 years ago

Okay thanks, I will try. And one more thing, I than tried with 10000 epochs and at the time of prediction result seems better than earlier. I think I should try with changing hyper parameters and CNN arch.

ZlaaM commented 4 years ago

@Rutvik21 how many training steps do you use?

ZlaaM commented 4 years ago

and what is the difference between train_steps and the number of epochs?

Rutvik21 commented 4 years ago

@ZlaaM I didn't pass the argument for training steps only passed epoch.

And here and here you can find difference between train_steps and number of epochs.