Best configuration for finetuning

Jogima-cyber commented 3 years ago

Do you have an idea for the best configuration for finetuning ? Which optimizer, which LR reduction strategy, need for dropout, regularization, need for weird learning technics?

Jogima-cyber commented 3 years ago

Btw for my current configuration I'm using this LR reducer with adam optimizer, LR set to 0.001, and regular tf training :

lr_reducer = tf.keras.callbacks.ReduceLROnPlateau(
        monitor="val_classification_loss", patience=3, min_lr=1e-6, mode='min')

leondgarse commented 3 years ago

I didn't train much on them. Saying the optimizer, many articles and my own practices result that SGD + momentum works better than adam on finetuning. You can also check SGDW / AdamW from tensorflow-addons. It also seems better using tfa.optimizers.Lookahead when finetuning.
The lr scheduler, recently I'm mainly using tf.keras.experimental.CosineDecayRestarts for SGD, also starts from initial_learning_rate=1e-3 for finetuning.

Other weird technics like StochasticDepth also comes from tensorflow-addons may also help:

model = efficientnet_v2.EfficientNetV2L(input_shape=(None, None, 3), survivals=(1, 0.8), dropout=1e-6, classes=1000)

Dropout or other regularization, I hadn't tested on them...
As you know, most of these things depend on dataset and model size. Sometime we may need heavier regularization, sometimes simmilar strategy may not work.

Jogima-cyber commented 3 years ago

Thank you very much for your insights!

leondgarse / keras_efficientnet_v2

Best configuration for finetuning #4