leondgarse / Keras_insightface

Insightface Keras implementation
MIT License
235 stars 56 forks source link

Distillation discussion #27

Closed John1231983 closed 3 years ago

John1231983 commented 3 years ago

Nice work for distillation. It improved with a big number comparison with baseline. For clear what you are doing for distillation. Could I ask some question related to the pipeline of distillation?

The pipeline that I understand likes

leondgarse commented 3 years ago

Yes, and that's the process I posted comparing with the baseline using arcface only.

I'm now testing some other strategies:

About the hard-sample mining:

John1231983 commented 3 years ago

Thanks so much for your information. For dropout, you means you are using it for student model, right? For distillation, have you try logits consistency loss? it means we also constrain the prediction of teacher and student should be same

leondgarse commented 3 years ago

This figure shows some of my results, using MXNet r100 as teacher model, training mobilenet on CASIA dataset, with different optimizer SGDW / AdamW + different losses, detailed in the labels.

Selection_298