Refactor and simplify calibrations

markbandstra commented 3 years ago

Adds a class for generic calibrations called Calibration, in addition to subclasses for specific types of calibration functions.

To create an arbitrary calibration function, simply write the function as a string, where x is the independent variable and p are the parameters, each of which must be referenced explicitly with an integer index. For example, here is how to instantiate a linear calibration with intercept of 5 and slope of 2:

cal = Calibration("p[0] + p[1] * x", [5, 2])

Here is a linear calibration with no intercept and a slope of 2:

cal = Calibration("p[0] * x", [2])

Calibrations are callable like functions, so cal(x) will return its value at x:

cal(10)
# Out: 20.0

Calibrations can include special functions, basically anything that asteval can handle, including many numpy functions:

cal = Calibration('p[0] + p[1] * x + p[2] * np.sqrt(x) + p[3] * np.exp(x / p[4])', [5, 2, 3, 7, 1000], domain=[0, 3000])

To fit a calibration function to points, you can use the set_points and add_points methods and then call fit, or instantiate the class using from_points:

cal = Calibration.from_points("p[0] * x", [0, 1000], [0, 200])
cal.params
# Out: array([0.2])

There are initialization methods for Calibration that are designed to make the creation of different functional forms easier. For example, here is a 4th order polynomial, where the order is inferred from the number of parameters:

cal = Calibration.from_polynomial([5, 2, 4, 1, 3])
cal.expression
# Out: 'p[0] + p[1] * x ** 1 + p[2] * x ** 2 + p[3] * x ** 3 + p[4] * x ** 4'

One of the initialization methods generates an expression that interpolates between the calibration points:

cal = Calibration.from_interpolation([0, 1000], [0, 200])
print(cal.expression)
# scipy.interpolate.interp1d([0, 1000], [0, 200], fill_value="extrapolate")(x)

This PR also adds tools to perform HDF5 I/O, which is fully supported for Calibration instances. So for any calibration you can do

cal.write("filename.h5")

and it will just work (Calibration.read("filename.h5") as well). You can also give an argument that is an already open HDF5 file or group.

Closes #245 Commits were cherry-picked from #244

TODO:

[x] add basic tools for HDF5 I/O
[x] add new Calibration class
[x] implement fit and from_points
[x] implement __str__ and __repr__
[x] pass in bounds and any other fitting parameters to fit() and from_points()
[x] allow the user to specify weights for each calibration point for use by fit()
[x] decide how strict we are going to be about extrapolation beyond calibration points
[x] determine how to organize Calibration and its subclasses within becquerel --- import them all to the top level?
[x] replace all occurrences of LinearEnergyCal with the new LinearCalibration
[x] update the energycal example notebook

jvavrek commented 3 years ago

One other request: fit statistics. R^2 and (reduced) chi2 are the big ones. I can't remember if scipy.optimize.least_squares() provides those or we would need to do it ourselves.

markbandstra commented 3 years ago

TODO items from a recent meeting:

[x] move static methods to hidden module-level functions
[x] make subclasses factory functions
[x] fix logic in from_points -- include_origin does not work right
[x] use expression everywhere
[x] specify and enforce function domain and range
[x] provide access to optional/auxiliary parameters ("q"?)
~[ ] specify whether expected to be invertible~
[x] if invertible, provide inverse expression
- calculate it automatically if none, but will be slower
- use scipy root finding
~[ ] make Calibration into EquationCalibration, and have an abstract base class~
~[ ] allow named parameters~
- ~parameters are a dict instead of an array~

jvavrek commented 3 years ago

I don't know if this is necessarily a responsibility for becquerel, but a useful feature would be the option to scale the x/y inputs (x especially) before performing the calibration, and then rescaling the calibration back after the fact. For various reasons I'm trying out a high-order polynomial fit and running into overflows when the input ADC value is > 4000 or so. I can divide by 1000 on my end, but keeping track of whether different calibrations use the downscaled values will be a pain.

markbandstra commented 3 years ago

For various reasons I'm trying out a high-order polynomial fit and running into overflows when the input ADC value is > 4000 or so.

Does this happen when you are calling fit? If so you can send the bounds kwarg to limit the range of the offending parameter(s).

jvavrek commented 3 years ago

For various reasons I'm trying out a high-order polynomial fit and running into overflows when the input ADC value is > 4000 or so.

Does this happen when you are calling fit? If so you can send the bounds kwarg to limit the range of the offending parameter(s).

It's when I call Calibration.from_points(). I'm not sure where exactly inside there that things go awry. I don't think the bounds are the issue, I think it is just the high-order polynomial and ADC input values in the thousands. E.g., I can get away with

p[0] + p[1] * x ** 1 + p[2] * x ** 2 + p[3] * x ** 3

but for the next higher order I need to do something like

p[0] + p[1] * (x/1000) ** 1 + p[2] * (x/1000) ** 2 + p[3] * (x/1000) ** 3 + p[4] * (x/1000) ** 4

Edit: well, even an ADC value of 10000 should cap out at only (10000)^4 = 1e16. Maybe it is the bounds...

markbandstra commented 3 years ago

I would try setting the bounds, and if that still doesn't solve it you can always put the scaling directly into the expression, as you have above:

p[0] + p[1] * (x/1000) ** 1 + p[2] * (x/1000) ** 2 + p[3] * (x/1000) ** 3 + p[4] * (x/1000) ** 4

is a valid expression. Also check that the new property domain is limited to (0, 4000) -- the default domain is much wider and it will test values throughout the entire domain when you instantiate the calibration.

lbl-anp / becquerel

Refactor and simplify calibrations #248