oeyh / NN

MIT License
1 stars 0 forks source link

Cannot use scipy.optimize.minimize method='CG' to run gradient descent #1

Closed oeyh closed 5 years ago

oeyh commented 6 years ago

This is in assignment3 section 1.4 function oneVsAll()

result = minimize(lrCostFucntion, theta0, args=(X, ylabel, lmd), method='TNC', jac=True, options={'disp': True, 'maxiter':1000})

Method='CG' should work, too, or even faster. But there's error.

Suspect a bug inside minimize function, it seems to change the shape of theta in the process, causing dimension issue when trying to do matrix multiplication.

oeyh commented 5 years ago

Further observation: method='CG' works for all cases i in range(9) except i=8, giving error: shapes (5000,401) and (1,1,401) not aligned: 401 (dim 1) != 1 (dim 1)

oeyh commented 5 years ago

Seems to be a bug in scipy.

oeyh commented 5 years ago

Observation: in certain circumstances, scipy.optimize.minimize(... method='CG'...) will add dimension to x0 (here x0=theta0) and thus making matrix multiplication in cost function error out.

Temporary workaround: in my cost function, ravel theta0 first, make sure it is 1D; then add proper dimension to it to make it a 2D array (column vector).

More comments:

  1. I dived deep into scipy's source codes hoping to find evidence that it adds dimension to x0 by mistake but couldn't. The source codes are still too hard for me to read. In the future, if possible, I'd like to find evidence, submit test report and maybe even try to fix it and submit pull request.....
  2. CG refers to conjugate gradient method, for more info, take a look at wikipedia page: https://en.wikipedia.org/wiki/Conjugate_gradient_method