Closed clpetix closed 5 months ago
@mphoward I've fixed this bug and added a unit test to catch it in the future! I think the tests are failing related to numpy 2.0, so this should be good for you to look at when you get a chance!
Thanks! This should fix the issue.
While investigating this and looking at the manuscript, I realized that the LineSearch
is using unscaled variables. This is consistent with what is written in the documentation, but I think is inconsistent with the SteepestDescent
using scaled variables to take steps. Could you look at this to confirm and see if you think we should add scale
as an argument of find
? We could default it to scale=1.0
for backward compatibility.
If you agree, I would try to fix that in this PR since it is a related issue.
Thanks! This should fix the issue.
While investigating this and looking at the manuscript, I realized that the
LineSearch
is using unscaled variables. This is consistent with what is written in the documentation, but I think is inconsistent with theSteepestDescent
using scaled variables to take steps. Could you look at this to confirm and see if you think we should addscale
as an argument offind
? We could default it toscale=1.0
for backward compatibility.If you agree, I would try to fix that in this PR since it is a related issue.
So currently SteepestDescent
is being updated using the scaled variables but LineSearch
is checking based on unscaled variables? I think we would just need to add in scale=1.0
as suggested and turn:
https://github.com/mphowardlab/relentless/blob/09505911377030e1db8786d6d8d412bbe785589e/src/relentless/optimize/method.py#L192
to
x.value = start.variables[x] + scale**2 * new_step * d[x]
.
I think we need the scale**2
because the gradient doesn't have the extra factor from using scaled variables like it does in steepest descent here:
https://github.com/mphowardlab/relentless/blob/09505911377030e1db8786d6d8d412bbe785589e/src/relentless/optimize/method.py#L400
Does this seem right to you, @mphoward ?
I think we need to apply one scale
factor to any gradient that is dotted against d
, which turns it into the gradient with respect to the scaled parameters. See Eq. (18) of the draft manuscript.
I also think when we calculate d
in the first place, we should be dividing by scale
. That would calculate the descent direction in scaled coordinates. See Eqs. (17) and (18) of the draft manuscript.
If you scale d
when you compute it, then I think you will need to multiply by one factor of scale
in line 192 to make the update apply to the unscaled parameters.
@mphoward that all makes sense! I think I have those changes implemented as you noted!
Thanks, here are a few things I noticed:
@mphoward here are those changes!
See issue #251