tBuLi / symfit

Symbolic Fitting; fitting as it should be.
http://symfit.readthedocs.org
MIT License
233 stars 17 forks source link

data keyword for Fit #281

Open tBuLi opened 4 years ago

tBuLi commented 4 years ago

Introduction

The data API to Fit has some drawbacks that need to be addressed. Firstly, the use of *args,**kwargs to send data is not a good design when large amount of variables need to be provided with data. Secondly, the possible names for Variable's are more constrained in symfit then in pure sympy. Thirdly, the result is cast into a dict with variables as keys and data as values.

Therefore, it seems like a better idea to use such a dict in the first place:

xdata = np.array([1.0, 2.0, 3.0, 4.0, 5.0])
ydata = np.array([2.3, 3.3, 4.1, 5.5, 6.7])

a, b = parameters('a, b')
x, y = variables('x, y')
model = Model({y: a * x + b})

data = {x: xdata, y:ydata}
fit = Fit(model, data=data)
fit_result = fit.execute()

For the basics I think this is a very clean API, but there are some more advanced features which should be supported:

Covariances

For co-variances, a possible API would be

data = {x: xdata, y:ydata, Cov(x,y): xycovariance}

where Cov could be a new symbol subclass or a convenience method which returns a new variable encoding this information. Similar objects could then be made for Stdev and Var.

Another option would be to use and expand the .sigmas dict already present on model classes:

>>> model = Model({y: a * x + b})
>>> model.sigmas
{x: sigma_x, x: sigma_x, (x,y): sigma_xy}

such that we could write

data = {x: xdata, y:ydata, model.sigmas[(x,y)]: xycovariance, model.sigmas[y]: yerr}

Global fitting

A global fitting problem will be written using indexed variables in the future:

i = Idx('i', 10)  # Run from 0, ..., 9
a, b = parameters('a, b')
a = IndexedBase(a)
x, y = variables('x, y', cls=IndexedBase)
model = Model({y[i]: a[i] * x[i] + b})

data = {x: (xdata1, ..., xdata10), y[0]: ydata1, ..., , y[9]: ydata10}
fit = Fit(model, data=data)
fit_result = fit.execute()

where x and y use the two different allowable styles of providing data to indexed variables. This can be combined with either of the afore mentioned API covariance styles.

pckroon commented 4 years ago

In principle, I agree so long as the current syntax remains available. Also, proper type checking is required (are the keys to data really Variables?). Leftover edge cases would be mixing the new data keyword (keyword only?) and the existing syntax.

data = {x: xdata, y: ydata, Cov(x, y): xycovariance}

I like this idea, augmented with Stdev and Var. Var would create a Variable if given a string, and otherwise does nothing. We should very explicitly not create a Parameter analogue since these shorthands should only be used to create data dicts, and parameters have no place in those. I'd like some further debate on the name of Stdev. An alternative could be Sigma? StdErr? StdDev?

data = {x: xdata, y:ydata, model.sigmas[(x,y)]: xycovariance, model.sigmas[y]: yerr}

This is very very ugly. I veto this.

The global fitting example you give looks quite OK. I'm not super happy with how to create the Parameters and Variables, but that's not for this issue.