Closed terkkila closed 10 years ago
Consider implementing Nesterov's accelerated gradient, momentum, and/or variance based adaptation of learning rates. This should make tuning of learning rate less critical to successful learning of model parameters.
Consider implementing Nesterov's accelerated gradient, momentum, and/or variance based adaptation of learning rates. This should make tuning of learning rate less critical to successful learning of model parameters.