Too huge step_size at initialization stage

lessw2020 / Ranger-Deep-Learning-Optimizer

Ranger - a synergistic optimizer using RAdam (Rectified Adam), Gradient Centralization and LookAhead in one codebase

Apache License 2.0

1.19k stars 176 forks source link

I found that step_size is too high in the initial 5 steps. The problem is in the code:

if N_sma >= self.N_sma_threshhold:
    step_size = math.sqrt((1 - beta2_t) * (N_sma - 4) / (N_sma_max - 4) * (N_sma - 2) / N_sma * N_sma_max / (N_sma_max - 2)) / (1 - beta1 ** state['step'])
else:
    step_size = 1.0 / (1 - beta1 ** state['step'])

If betas are set to (0.9, 0.999) the internal variables are changed as following:

state['step']| step_size
------------------------------
        1    |     10
        2    |5.26315789
        3    |3.6900369
        4    |2.90782204
        5    |2.44194281
        6    |0.00426327
        7    |0.00524248
        8    |0.00607304
        9    |0.00681674
       10    |0.00750596

Note, that step_size doesn't depend on gradient value and it scales learning_rate. Thus RAdam aggressively moves weights from their initial values, even if they have a good initialization.

Is it better to set step_size equal to 0 if N_sma < self.N_sma_threshhold?

lessw2020 / Ranger-Deep-Learning-Optimizer

Too huge step_size at initialization stage #15