Triplet loss training - Githubissues

SaadSallam7 commented 1 year ago

I was trying to train FaceNet on kaggle using TPU but I had some problem and I noticed that you have train with it before and have good results so can you help me, please? I used batch hard strategy with the code provided here -I compared it with your implementation they gave the same results so there's no problem in the implementation- I'm training with vggface2 dataset where I take 32 image per the person and a batch size of 1024 so the batch will contain 32 different persons each with 32 image. The problem is that there's no improving on the test set, accuracy and threshold are constants at 0.5, 0 even after 10 epochs. 269/269 [==============================] - ETA: 0s - loss: 1.0424

lfw evaluation max accuracy: 0.500000, thresh: 0.000000, previous max accuracy: 0.000000 Improved = 0.500000 Saving model to: /kaggle/working/chekpoints_basic_lfw_epoch_1_0.500000.h5 Epoch 2/50 269/269 [==============================] - ETA: 0s - loss: 1.0030

lfw evaluation max accuracy: 0.500000, thresh: 0.000000, previous max accuracy: 0.500000 Improved = 0.000000 Saving model to: /kaggle/working/chekpoints_basic_lfw_epoch_2_0.500000.h5 269/269 [==============================] - 191s 712ms/step - loss: 1.0030 Epoch 3/50 213/269 [======================>.......] - ETA: 5s - loss: 1.0015 lfw evaluation max accuracy: 0.500000, thresh: 0.000000, previous max accuracy: 0.500000 Improved = 0.000000 Saving model to: /kaggle/working/chekpoints_basic_lfw_epoch_3_0.500000.h5 269/269 [==============================] - 190s 710ms/step - loss: 1.0015 Epoch 4/50 269/269 [==============================] - ETA: 0s - loss: 1.0012

lfw evaluation max accuracy: 0.500000, thresh: 0.000000, previous max accuracy: 0.500000 Improved = 0.000000 Saving model to: /kaggle/working/chekpoints_basic_lfw_epoch_4_0.500000.h5 269/269 [==============================] - 191s 712ms/step - loss: 1.0012 Epoch 5/50 269/269 [==============================] - ETA: 0s - loss: 1.0011

lfw evaluation max accuracy: 0.500000, thresh: 0.000000, previous max accuracy: 0.500000 Improved = 0.000000 Saving model to: /kaggle/working/chekpoints_basic_lfw_epoch_5_0.500000.h5 269/269 [==============================] - 192s 715ms/step - loss: 1.0011 Epoch 6/50 269/269 [==============================] - ETA: 0s - loss: 1.0011

lfw evaluation max accuracy: 0.500000, thresh: 0.000000, previous max accuracy: 0.500000 Improved = 0.000000 Saving model to: /kaggle/working/chekpoints_basic_lfw_epoch_6_0.500000.h5 269/269 [==============================] - 193s 718ms/step - loss: 1.0011 Epoch 7/50 269/269 [==============================] - ETA: 0s - loss: 1.0009

lfw evaluation max accuracy: 0.500000, thresh: 0.000000, previous max accuracy: 0.500000 Improved = 0.000000 Saving model to: /kaggle/working/chekpoints_basic_lfw_epoch_7_0.500000.h5 269/269 [==============================] - 193s 718ms/step - loss: 1.0009 Epoch 8/50 269/269 [==============================] - ETA: 0s - loss: 1.0008

lfw evaluation max accuracy: 0.500000, thresh: 0.000000, previous max accuracy: 0.500000 Improved = 0.000000 Saving model to: /kaggle/working/chekpoints_basic_lfw_epoch_8_0.500000.h5 269/269 [==============================] - 192s 717ms/step - loss: 1.0008 Epoch 9/50 269/269 [==============================] - ETA: 0s - loss: 1.0008

lfw evaluation max accuracy: 0.500000, thresh: 0.000000, previous max accuracy: 0.500000 Improved = 0.000000 Saving model to: /kaggle/working/chekpoints_basic_lfw_epoch_9_0.500000.h5 269/269 [==============================] - 193s 718ms/step - loss: 1.0008 Epoch 10/50 269/269 [==============================] - ETA: 0s - loss: 1.0008

lfw evaluation max accuracy: 0.500000, thresh: 0.000000, previous max accuracy: 0.500000 Improved = 0.000000 Saving model to: /kaggle/working/chekpoints_basic_lfw_epoch_10_0.500000.h5

This is the notebook if you can take a look. Thanks in advance.

leondgarse commented 1 year ago

I cannot see your notebook, telling No saved version. Generally, triplet loss should better used after some softmax or arcface training, as in the early stage of training, the model cannot mine a good positive / negative pair. May refer some related issue like MobileFacenet SE Train from scratch #9 or the result ResNet101V2 using nadam and finetuning with triplet.

SaadSallam7 commented 1 year ago

I'm sorry but you can open it now. Ok, I will train it with arcface then triplet loss but to be honest, I don't think this what makes the a problem as the accuracy is 50% indicates that the model isn't really learning it gives always true or always false! Last question please, how are you initializing the dataset for online mining? for me, when I read the dataset I read it sorted so the first 32 example are for one class and the second 32 example are for another class and so on so the batches are fixed while fitting the model but I think in the original paper they were sample batches randomly.

leondgarse commented 1 year ago

I just took some basic tests in colab Keras_insightface_CASIA.ipynb, the last Test part, using only 4 classes for training. Though the result not good, but at least the loss is dropping, and the lfw accuracy just better than 0.5.
For offline mining, the kernel function in dataset is data.py#L445-L446, that takes image_per_class images from some randomly picked classes. It's just making sure each class has some positive samples. But technically, the regular dataset just randomly picking images without this strategy also works. Like if we picked [0, 1, 1, 2, 2, 2] classes, 0 will just use itself as positive one.
I think you are using [0, 255] value range for model trainig and evaluating, which maybe not good. Another tiny issue is in eval_callback.__eval_func__, don't need to call normalize again as you already have it normalized.
At least the loss should be dropping, and the lfw threshhold value should not be 0. You may check the trained model, like run manually on some images and compare their similarity.

leondgarse / Keras_insightface

Triplet loss training #118