yxu0611 / Tensorflow-implementation-of-LCNN

A Tensorflow implementation of "A Light CNN for Deep Face Representation with Noisy Labels"
78 stars 12 forks source link

Training LightCNN-29 with MsCeleb-Faces-Aligned #2

Closed lyatdawn closed 6 years ago

lyatdawn commented 6 years ago

When Training LightCNN-29 with MsCeleb-Faces-Aligned, we can see: the training loss decreases slowly, and the training/valing accuracy increase also slowly. Could you give some advises to speed the training phase? For example, the initialization methods for init the weights, the learning rate scheduler, and so on. Thanks

yxu0611 commented 6 years ago

@ly-atdawn, here I share my experiments setting. For weights initialization, I use "xavier" for w, and constant value "0.1"for b, see layer.py file: https://github.com/yxu0611/Tensorflow-implementation-of-LCNN/blob/master/train/layer.py For learning rate, I use "AdamOptimizer" as optimization method, and initial learning rate is 0.0001, when loss doesn't decrease, I manually decrease it to 10 time smaller (0.00001), see LCNN29.py: https://github.com/yxu0611/Tensorflow-implementation-of-LCNN/blob/master/train/LCNN29.py I feel there could be some other factors which can effect the training speed.

  1. Load training data. If you load training data by frames, it it not a efficient way. For my implementation, I convert image frame to hd5 files in offline.
  2. Reduce the frequency for saving log/model
  3. Increase your batchsize
  4. Multiple GPU training if it is possible for your. For me, It almost took me 8-10 days to finish training procedure with around 70K identities.
lyatdawn commented 6 years ago

Thanks, I will research seriously

yxu0611 commented 6 years ago

@ly-atdawn, welcome to raise questions here :)

lyatdawn commented 6 years ago

Why not transform the image data to tfrecords format, in Tensorflow, tfrecords format is possible fastest way to load data? ? Right? Use h5 files, I use GTX 1080Ti to train LightCNN-29, why I see the main time is spending at load data??

yxu0611 commented 6 years ago

@ly-atdawn, you are right, one possible faster way is to convert image data to tfrecords. But at that time, I didn't explore too much about it. For me, same behavior with you, even if I use hd5 files to load data, the training speed is still slow. One way I feel could speed up training when we use hd5 files, we can load all the hd5 files in the CPU memory before training, if your training data is not so big and you have a sufficient CPU memory

lyatdawn commented 6 years ago

ok, I will train it for a long time to see the loss and acc. Thanks.