patrick-kidger / optimistix

Nonlinear optimisation (root-finding, least squares, ...) in JAX+Equinox. https://docs.kidger.site/optimistix/
Apache License 2.0
265 stars 12 forks source link

Behavior of BFGS #47

Open NightWinkle opened 4 months ago

NightWinkle commented 4 months ago

Hi,

I'm not sure if this is the desired behavior, but BFGS directly stops if f(y0) == 0. As an example :

from optimistix import minimise, BFGS, rms_norm
import equinox as eqx
import jax.numpy as jnp

@eqx.filter_jit
def f(y, args):
    return jnp.sum(jnp.square(y))

N = 2

def f2(y, args):
    return f(y, args) - N

y0 = jnp.ones((N,))

res = minimise(
    f,
    BFGS(rtol=1e-13, atol=1e-13, norm=rms_norm),
    y0=y0,
    max_steps=1024,
)

print(res.stats["num_steps"])
assert jnp.allclose(res.value, jnp.zeros((N,)))

res = minimise(
    f2,
    BFGS(rtol=1e-13, atol=1e-13, norm=rms_norm),
    y0=y0,
    max_steps=1024,
)

print(res.stats["num_steps"])
assert jnp.allclose(res.value, jnp.zeros((N,)))

This is because then the termination condition is already validated at first step. It's not really a big problem since you just need to offset the cost by a constant but it took me a while to figure out where the problem was coming from. Maybe the initial value for f_info should be set to infinity, or maybe this should be mentioned in the doc ?

NightWinkle commented 4 months ago

Yes, but notice I initialize at $y = 1$. The mínima is unique and at $0$.

Le jeu. 29 févr. 2024, 20:37, Jason Rader @.***> a écrit :

Hmm, does this persist for functions whose derivative is not $0$ at $0$? $y^Ty

  • N$ has derivative $0$ at $y = 0$. It's unsurprising if BFGS does not move when initialised at a critical point.

— Reply to this email directly, view it on GitHub https://github.com/patrick-kidger/optimistix/issues/47#issuecomment-1971830099, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB2T2XDAUO6QVOPLTUQHGLTYV6BOFAVCNFSM6AAAAABEAAKIICVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZRHAZTAMBZHE . You are receiving this because you authored the thread.Message ID: @.***>

patrick-kidger commented 4 months ago

Ah, I think I agree. In fact I think we might be making the same mistake in other solvers as well, e.g.

https://github.com/patrick-kidger/optimistix/blob/57be72f9c0c9db02799d38ab6fcd97e98fe779dd/optimistix/_solver/gradient_methods.py#L142

I'd be happy to take a pull request (+ some tests) adjusting this!