lbl-anp / becquerel

Becquerel is a Python package for analyzing nuclear spectroscopic measurements.
Other
43 stars 16 forks source link

Refactor and simplify calibrations #248

Closed markbandstra closed 3 years ago

markbandstra commented 3 years ago

Adds a class for generic calibrations called Calibration, in addition to subclasses for specific types of calibration functions.

To create an arbitrary calibration function, simply write the function as a string, where x is the independent variable and p are the parameters, each of which must be referenced explicitly with an integer index. For example, here is how to instantiate a linear calibration with intercept of 5 and slope of 2:

cal = Calibration("p[0] + p[1] * x", [5, 2])

Here is a linear calibration with no intercept and a slope of 2:

cal = Calibration("p[0] * x", [2])

Calibrations are callable like functions, so cal(x) will return its value at x:

cal(10)
# Out: 20.0

Calibrations can include special functions, basically anything that asteval can handle, including many numpy functions:

cal = Calibration('p[0] + p[1] * x + p[2] * np.sqrt(x) + p[3] * np.exp(x / p[4])', [5, 2, 3, 7, 1000], domain=[0, 3000])

To fit a calibration function to points, you can use the set_points and add_points methods and then call fit, or instantiate the class using from_points:

cal = Calibration.from_points("p[0] * x", [0, 1000], [0, 200])
cal.params
# Out: array([0.2])

There are initialization methods for Calibration that are designed to make the creation of different functional forms easier. For example, here is a 4th order polynomial, where the order is inferred from the number of parameters:

cal = Calibration.from_polynomial([5, 2, 4, 1, 3])
cal.expression
# Out: 'p[0] + p[1] * x ** 1 + p[2] * x ** 2 + p[3] * x ** 3 + p[4] * x ** 4'

One of the initialization methods generates an expression that interpolates between the calibration points:

cal = Calibration.from_interpolation([0, 1000], [0, 200])
print(cal.expression)
# scipy.interpolate.interp1d([0, 1000], [0, 200], fill_value="extrapolate")(x)

This PR also adds tools to perform HDF5 I/O, which is fully supported for Calibration instances. So for any calibration you can do

cal.write("filename.h5")

and it will just work (Calibration.read("filename.h5") as well). You can also give an argument that is an already open HDF5 file or group.

Closes #245 Commits were cherry-picked from #244

TODO:

jvavrek commented 3 years ago

One other request: fit statistics. R^2 and (reduced) chi2 are the big ones. I can't remember if scipy.optimize.least_squares() provides those or we would need to do it ourselves.

markbandstra commented 3 years ago

TODO items from a recent meeting:

jvavrek commented 3 years ago

I don't know if this is necessarily a responsibility for becquerel, but a useful feature would be the option to scale the x/y inputs (x especially) before performing the calibration, and then rescaling the calibration back after the fact. For various reasons I'm trying out a high-order polynomial fit and running into overflows when the input ADC value is > 4000 or so. I can divide by 1000 on my end, but keeping track of whether different calibrations use the downscaled values will be a pain.

markbandstra commented 3 years ago

For various reasons I'm trying out a high-order polynomial fit and running into overflows when the input ADC value is > 4000 or so.

Does this happen when you are calling fit? If so you can send the bounds kwarg to limit the range of the offending parameter(s).

jvavrek commented 3 years ago

For various reasons I'm trying out a high-order polynomial fit and running into overflows when the input ADC value is > 4000 or so.

Does this happen when you are calling fit? If so you can send the bounds kwarg to limit the range of the offending parameter(s).

It's when I call Calibration.from_points(). I'm not sure where exactly inside there that things go awry. I don't think the bounds are the issue, I think it is just the high-order polynomial and ADC input values in the thousands. E.g., I can get away with

p[0] + p[1] * x ** 1 + p[2] * x ** 2 + p[3] * x ** 3

but for the next higher order I need to do something like

p[0] + p[1] * (x/1000) ** 1 + p[2] * (x/1000) ** 2 + p[3] * (x/1000) ** 3 + p[4] * (x/1000) ** 4

Edit: well, even an ADC value of 10000 should cap out at only (10000)^4 = 1e16. Maybe it is the bounds...

markbandstra commented 3 years ago

I would try setting the bounds, and if that still doesn't solve it you can always put the scaling directly into the expression, as you have above:

p[0] + p[1] * (x/1000) ** 1 + p[2] * (x/1000) ** 2 + p[3] * (x/1000) ** 3 + p[4] * (x/1000) ** 4

is a valid expression. Also check that the new property domain is limited to (0, 4000) -- the default domain is much wider and it will test values throughout the entire domain when you instantiate the calibration.