Implementation is incorrect for analytical Jacobian

nmayorov commented 9 years ago

Hi!

I'm working on bounded least-squares algorithms for scipy in GSoC this year. I was surprised by completely inadequate performance of leastsqbound in bounded problems, see my benchmarks.

I determined the reason after running benchmarks with numerical Jacobian approximation, when leastsqbound started to work rather well. If we do variables transformation we need to compute the Jacobian with respect to the new variables, J' = J T, where T is the Jacobian of transformation T_ij = dx_i / dx'_j. This isn't done in the code, so the Jacobian used is irrelevant to the new variables, thus the performance is poor.

Update for clarification:

These lines are the source of error. We need to pass modified Jacobian to leastsq. I know that you consider transformation in the end (line 347), but it isn't enough.

newville commented 9 years ago

@nmayorov I have a hard time following your table. "g_norm" isn't defined, you say that "value" is "for convenience" but otherwise undefined, and you identify "nfev" as an important column. I'd suggest that the number of evaluations is of secondary importance in comparison to getting a good estimate of the parameter values. As far as I can tell, that information is missing from your comparison. Perhaps it is what "g_norm" means? Anyway, please define all your terms.

If I understand correctly, you're saying that the Jacobians are not calculated correctly when the bounds transformation is used. It is correct when done internally (ie, with lmdif instead of 'lmder'), as the minimization is done fully on the "internal" variables. You didn't actually mention whether you are even supplying Jacobian functions for any of your tests. If you are supplying Jacobians, it sort of depends what you expect them to do.

I do believe your basic point that the results with user-supplied Jacobians is confusing, and it would be better to transform the result of the user-supplied Jacobian (expected to work solely on "external variables") back to "internal" variables. Is that a correct interpretation? Can you provide PRs for leastsqbound and lmfit (which is slightly diverged from leastsqbound)?

jjhelmus commented 9 years ago

I'm can't add much to this discussion besides saying that I would be happy to accept any changes to leastsqbound which would address this issue. My use of leastsqbound does not require a Jacobian so that portion of the code is not well tested.

nmayorov commented 9 years ago

@newville

Yes, this benchmark report should be better, sorry. I define g_norm in my notes at the beginning. It is the standard infinity-norm of the gradient in unconstrained case. In a constrained case it is the infinity-norm of properly scaled gradient to account for the presence of bounds, I use this concept in one of the algorithms I'm working on in GSoC. At the optimal point it is exactly zero, please read my post if you are interested in details.

value is the objective value we are minimizing, the sum of squares. If you don't trust my mysterious g_norm you can roughly judge the performance by looking at it.

If I understand correctly, you're saying that the Jacobians are not calculated correctly when the bounds transformation is used. It is correct when done internally (ie, with lmdif instead of 'lmder'), as the minimization is done fully on the "internal" variables.

It is the main point!

You didn't actually mention whether you are even supplying Jacobian functions for any of your tests. If you are supplying Jacobians, it sort of depends what you expect them to do.

Yes, all problems use analytical (user supplied) Jacobian. Again, sorry for not saying that.

I do believe your basic point that the results with user-supplied Jacobians is confusing,

They are not confusing, they just not satisfactory, the algorithm doesn't converge to optimal point.

and it would be better to transform the result of the user-supplied Jacobian (expected to work solely on "external variables") back to "internal" variables. Is that a correct interpretation?

It's not just better, it is the only correct way.

Can you provide PRs for leastsqbound and lmfit (which is slightly diverged from leastsqbound)?

I think I will, but it's not the top priority now (I hope you understand).

Please feel free to ask any questions, as my explanations tend to be not very clear.

nmayorov commented 9 years ago

I improved the benchmark report, hopefully it will make more sense to you now.

jjhelmus commented 9 years ago

@nmayorov Any chance you could create a small self contained example which shows the issue? This would help me to fix this issue faster than trying to pull out an example from your benchmarks myself.

nmayorov commented 9 years ago

Yes, give me 15 minutes I will prepare gist.

nmayorov commented 9 years ago

I think it's illustrative enough https://gist.github.com/nmayorov/99edcab1f701893755f1

jjhelmus commented 9 years ago

Thanks for the example, very helpful. I now understand the cause of the issue, I'll try to devise a fix so that the Jacobian is properly adjusted for the variable transform. This might take a bit (a number of days) as leastsqbound is a hobby project for me. If you find a solution before then please submit a PR.

newville commented 9 years ago

@newville https://github.com/newville

Yes, this benchmark report should be better, sorry. I define g_norm in my notes at the beginning. It is the standard infinity-norm of the gradient in unconstrained case. In a constrained case it is the infinity-norm of properly scaled gradient to account for the presence of bounds, I use this concept in one of the algorithms I'm working on in GSoC. At the optimal point it is exactly zero, please read my post https://nmayorov.wordpress.com/2015/06/19/trust-region-reflective-algorithm/ if you are interested in details.

value is the objective value we are minimizing, the sum of squares. If you don't trust my mysterious g_norm you can roughly judge the performance by looking at it.

If I understand correctly, you're saying that the Jacobians are not calculated correctly when the bounds transformation is used. It is correct when done internally (ie, with lmdif instead of 'lmder'), as the minimization is done fully on the "internal" variables.

It is the main point!

OK, good to know. Thanks for clarifying.

You didn't actually mention whether you are even supplying Jacobian

functions for any of your tests. If you are supplying Jacobians, it sort of depends what you expect them to do.

Yes, all problems use analytical (user supplied) Jacobian. Again, sorry for not saying that.

I do believe your basic point that the results with user-supplied Jacobians is confusing,

They are not confusing, they just not satisfactory, the algorithm doesn't converge to optimal point.

Well, that's because you assumed that the Jacobians should work a certain way (specifically, without knowledge of how the bounds are implemented). I completely agree that this is a reasonable assumption.

and it would be better to transform the result of the user-supplied Jacobian (expected to work solely on "external variables") back to "internal" variables. Is that a correct interpretation?

It's not just better, it is the only correct way.

Well, one could imagine taking the bounds mechanism into account in the Jacobian function. I am not at all suggesting that this should be expected of the user of leastsqbound or lmfit. So, I would agree that is the best way, but not that it is the only way.

It looks to me like it is necessary to have something like _external2internal_grad, that applies the chain rule for bounded variables and change the wrapped Dfun (wDfun) to use this. Do you all agree?

Can you provide PRs for leastsqbound and lmfit (which is slightly diverged from leastsqbound)?

I think I will, but it's not the top priority now (I hope you understand).

Yes, I do understand. Thanks for identifying the problem! Of course, we want to fix it soon, but probably have less time to look into this than you've already spent looking at the problem.

jjhelmus commented 9 years ago

It looks to me like it is necessary to have something like _external2internal_grad, that applies the chain rule for bounded variables and change the wrapped Dfun (wDfun) to use this. Do you all agree?

That was my thought on what need to be done.

nmayorov commented 9 years ago

It looks to me like it is necessary to have something like _external2internal_grad, that applies the chain rule for bounded variables and change the wrapped Dfun (wDfun) to use this. Do you all agree?

That's exactly what is required :)

newville commented 9 years ago

@nmayorov Awesome! Thanks!

jjhelmus / leastsqbound-scipy

Implementation is incorrect for analytical Jacobian #6