A better fitting interface

kmnhan commented 7 months ago

We removed guess_fit since fitting is not magic, and users should be aware of what the initial parameters are. “Explicit is better than implicit.”

Arguably, this introduces a little bit of inconvenience to the fitting workflow since we need to specify independent vars.

Another shortcoming is that the PyARPES approach of creating Xarray objects containing ModelResults has its advantages, but placing non-picklable objects into NetCDF-like structures is counterintuitive and could be misleading, but I can’t think of a better alternative…

kmnhan commented 7 months ago

Maybe add a callable accessor named lmfit or qfit that closely follows DataArray.curvefit syntax but takes an lmfit model. I think the best pythonic approach would be to use apply_ufunc, but we'll have to see how performant it is when conducting parallel fits.

kmnhan commented 7 months ago

An initial version of a callable accessor based on apply_ufunc has been added with e06982d as modelfit. Slower than joblib parallelization but faster than expected, should return in a few seconds for couple hundred well-conditioned fits.

It is very versatile but not as easy to use as I thought. Returns the best fit coeffs, their stderr, and goodness of fit statistics. I tried to make it return the covariance matrix and the number of variables and the initial parameters, but this is difficult since they may differ for each fit. One idea is to write them in terms of params, and leave the unrelated variables as NaN... should see how ambiguous this is to the user.

On the other hand, it should be feasible to store the y-values, try to implement that. Maybe apply_ufunc has a nice way of handling it.

kmnhan commented 7 months ago

Should close with 0f7a1e0. Added covariance matrix and modelresult object (optional) to output. Parallelization is implemented by converting to dataset and parallelizing over data_vars, may not be the most efficient way but it works!

kmnhan / erlabpy

A better fitting interface #22