uwhpsc-2016 / homework1

Homework #1
1 stars 1 forks source link

Gradient Descent Termination Issues #18

Open afarahatoutset opened 8 years ago

afarahatoutset commented 8 years ago

How are we going to be tested for termination? Combinations of a larger choice of sigma, initial guess and function will make gradient_descent not to converge.

Are we supposed to detect that in the implementation? Is the grading test-suite checking for termination?

quantheory commented 8 years ago

Unless otherwise indicated, you can assume that we'll be testing with examples that are well within the region where gradient_descent should converge for a given problem.

In a real-world problem, you would probably want to make sure that gradient_descent exits if it does not seem to be converging. For example you could add an optional argument that specifies a maximum number of iterations (after which the function might generate an exception or warning). But I don't believe that this will be part of the evaluation for this problem.

cswiercz commented 8 years ago

Unless otherwise indicated, you can assume that we'll be testing with examples that are well within the region where gradient_descent should converge for a given problem.

In a real-world problem, you would probably want to make sure that gradient_descent exits if it does not seem to be converging. For example you could add an optional argument that specifies a maximum number of iterations (after which the function might generate an exception or warning). But I don't believe that this will be part of the evaluation for this problem.

Confirmed.

In real life you want checks against these situations. For this problem the test input will be simple enough.

Also in real life, there is a better way to implement gradient descent which, essentially, works by determining the best choice of sigma for the current guess. We’re not using this version of the algorithm in this assignment.

alyfarahat commented 8 years ago

Yeah, if we apply Newton-Raphson method to solve for the zeroes of f'(x), we get the gradient-map with sigma = 1/f''(x). What is bad about it is that we'll probably end up with a division by zero at inflection points

cswiercz commented 8 years ago

Totally. Even if you think about the method geometrically you can see that in the 1D case approaching from one side gets you to an inflection point (assuming sigma is sufficiently small) while "approaching" from the other side actually puts you away from the inflection point. This algorithm doesn't behave well in that situation but is a decent "first step" to minimization algorithms.

We may end up writing better ones later, especially once we introduce MPI.