leondgarse / Keras_insightface

Insightface Keras implementation
MIT License
240 stars 56 forks source link

Training on large datasets with a lot of identities #90

Closed abdikaiym01 closed 2 years ago

abdikaiym01 commented 2 years ago

Hi, have you experienced large datasets with your framework (Keras_insightface)? For example, can I use your repository in order to train glint360 or webface12M?

leondgarse commented 2 years ago

It's not working well. I've tried glint360k, it can be trained, just took longer, much longer. Also tried implementing the partial_fc strategy, but the training results are not satisfying. Will try it again once got spare time.

abdikaiym01 commented 2 years ago

Ok, thank you! You suggest use their official implementation ( https://github.com/deepinsight/insightface/tree/master/recognition/arcface_torch ) to train on large datasets (glint360 or webface12M)? Did you try to use their pytorch implementation (recognition/arcface_torch) ?

leondgarse commented 2 years ago

Yes, this repo still haven't replicated partial_fc implementation. For some testing, current partial_fc in this repo also works for a single GPU, but my training result is not good. I haven't tried their pytorch implementation, just read some results compare. Currently, training without partial_fc using Glint360k dataset takes almost 5 times as MS1MV3, while total images is only 3 times. So, ya, this is something need to be supported...

leondgarse commented 2 years ago

partial_fc implementation is still hard for me, as it needs detail control of how to shard / aggregation over multiple replicas. Tensorflow introduced DTensor in TF 2.9.0, and I think it's a key feature implementing this, also a key for training other large models / datasets. It's rather new for me, and need some learning and testing on it, will try if it's possible for partial_fc.

abdikaiym01 commented 2 years ago

Thank you, but insightface implemented 'partial_fc' in pytorch, is it because pytorch more comfortable use technic introduced in paper 'partial_fc'?

leondgarse commented 2 years ago

Though it's hard for me, still believe TF sharing the same potential implementing it, and may also be as comfortable. Like for someone more familiar with distribution strategy, writing custom gathering / training steps. As far as I can see, it's the output NormDense layer, that may need a concatenate strategy on gathering weights, and currently it's SUM / MEAN available.

leondgarse commented 2 years ago

I'm not sure if you still looking forward for this, just finished some basic training on Glint360k with efficientnetv2s. Here's some results:

abdikaiym01 commented 2 years ago

O thank you, good job, I'm impressed. Could you realse this pretrained moldes?

leondgarse commented 2 years ago

I put it here TT_effv2_s_glint360k_mag_bs_256_test_random_0_E25_basic_model_latest.h5. Still trying some training, but met some loss=Nan error...

leondgarse commented 2 years ago

OK, it's been 1 month, and my r100 PReLU dropout 0.4 using SGD + l2 regularizer + randaug + AdaFace training on Glint360K dataset and partial FC is finally finished! Now can claim it a reproduced of partialFC result. :)