Optimization in Deep Learning

Hello. Prof Atienza. You said here (Lecture 3, Slide 23) that we can only use gradient descent in optimizing deep learning models (particularly loss function). I understand that both closed-form solution and Newton's method are computationally painful to use, that non-iterative 1st order optimization is inefficient, and that convex optimization is also not an option since most deep learning models have multiple extremas. We are therefore left only with gradient descent.

Are there other approaches to optimization in deep learning?

roatienza / Deep-Learning-Experiments

Optimization in Deep Learning #17