Different architecture of the provided checkpoints

aamir-mustafa-yoti commented 10 months ago

Hi, Thanks for the great work. I have noticed that the EfficientNetv2S checkpoints provided do not have the exact same last few layers as the code---> output_layer == "F":

The last few layers of the provided checkpoint are:


 F_flatten (Flatten)            (None, 25088)        0           ['dropout[0][0]']                

 F_dense (Dense)                (None, 512)          12845056    ['F_flatten[0][0]']              

 pre_embedding (BatchNormalizat  (None, 512)         2048        ['F_dense[0][0]']                
 ion)                                                                                             

 embedding (Activation)         (None, 512)          0           ['pre_embedding[0][0]']

Whereas, the output_layer = F gives the following:


 F_dense (Dense)                (None, 512)          12845056    ['F_flatten[0][0]']              

 reshape (Reshape)              (None, 1, 1, 512)    0           ['F_dense[0][0]']                

 pre_embedding (BatchNormalizat  (None, 1, 1, 512)   2048        ['reshape[0][0]']                
 ion)                                                                                             

 flatten (Flatten)              (None, 512)          0           ['pre_embedding[0][0]']          

 embedding (Activation)         (None, 512)          0           ['flatten[0][0]']

I understand that this should not make any difference to the model, but is there a particular reason for doing this?

Thanks

leondgarse commented 10 months ago

Ya, it makes no difference , and the change is introduced from fix BatchNormalization for QAT, fixing #115.

aamir-mustafa-yoti commented 10 months ago

Thanks for clarification.

Another question: For how many epochs are the pre-trained 'latest_models' for EfficientNet v2S trained for?

leondgarse commented 10 months ago

It's 67 epochs, 50 epochs from scratch, and go on trained for another 17 epochs. EffcientNetV2S swish drop_conn 0.2 dropout 0.2 using SGD + L2 regularizer + cosine lr decay + randaug training on MS1MV3 dataset.

leondgarse / Keras_insightface

Different architecture of the provided checkpoints #127