It is unclear for users that learning rate decay is active. I would argue it should be set to 0 by default, 0.5 can be pretty high. Training can be noisy in the beginning if one starts from random weights and the default learning rate decay behavior can destroy a training procedure easily.
It is unclear for users that learning rate decay is active. I would argue it should be set to 0 by default, 0.5 can be pretty high. Training can be noisy in the beginning if one starts from random weights and the default learning rate decay behavior can destroy a training procedure easily.