Open eringrant opened 2 months ago
This is expected... but admittedly maybe not great design.
The relevant code is here:
The way this works is that we actually treat general gradient methods, which typically start by picking a descent direction, and then performing a line search in that direction. Once the line search has found an acceptable point to stop, then this location is used to start a new line search.
In the case of GradientDescent
, the line search is a single step of size corresponding the learning rate, and the result is always treated as acceptable. This means that the 'accepted' point is the start of the line search -- which is the previous iteration.
Off the top of my head I'm not sure how we'd change this. We might be able to tweak the logic in the above block of code to remove this off-by-one approach to things. (I'm open to suggestions on this one.)
It seems like the first call to
step
of theGradientDescent
optimizer doesn't perform the step operation. I didn't check if this occurs for other optimizers or do other digging, but can do so if this is not expected behavior and the cause is not immediate. Here is a MWE:Running with
GradientDescent
gives:cf.
OptaxMinimiser(optax.sgd(...), ...)
: