The ‘magic sauce’ that allows neos to differentiate through the fitting process is based on an implementation of fixed-point differentiation. Αs I understand it, the gist of how this works is that if a function has a fixed point, i.e. f(x) = x for some x (e.g. a minimize(F, x_init) routine evaluated at x_init = minimum of F), then one can evaluate the gradients through a second pass of the function, evaluated close to the fixed point.
It would be nice to consolidate some thoughts (perhaps in a notebook) on the technical details for those interested. The specific algorithm used in neos can be found in section 2.3 of this paper (two-phase method).
The ‘magic sauce’ that allows neos to differentiate through the fitting process is based on an implementation of fixed-point differentiation. Αs I understand it, the gist of how this works is that if a function has a fixed point, i.e. f(x) = x for some x (e.g. a
minimize(F, x_init)
routine evaluated atx_init
= minimum ofF
), then one can evaluate the gradients through a second pass of the function, evaluated close to the fixed point.It would be nice to consolidate some thoughts (perhaps in a notebook) on the technical details for those interested. The specific algorithm used in neos can be found in section 2.3 of this paper (two-phase method).