machine-discovery / deer

Parallelizing non-linear sequential models over the sequence length
BSD 3-Clause "New" or "Revised" License
40 stars 1 forks source link

Proof of convergence #1

Closed mfkasim1 closed 11 months ago

mfkasim1 commented 1 year ago
mfkasim1 commented 1 year ago

If the solution exists, can we proof that the convergence always exists?

The perturbation theory only leads to fixed-point iteration of y = g[y] = L0_inv[f] - L0_inv_L[sigma(y)]. mode=1 only leads to the naive fixed-point iteration which only works if the absolute eigenvalue of the Jacobian of g[y] less than 1. mode=2 leads to linear mixing, which is a bit better, but not so much. With linear mixing, you can only converge if the largest eigenvalue of the Jacobian satisfies 1 < largest_eival < -(1 + eps) / (1 - eps). So, the convergence does not always exists. Unless maybe if we do some nonlinear preconditioning on the equation.

mfkasim1 commented 11 months ago

So many things happened after this and this issue becomes unrelated.