lbl-anp / becquerel

Becquerel is a Python package for analyzing nuclear spectroscopic measurements.
Other
43 stars 16 forks source link

Unexpected silent roundoff / precision loss in `bq.Calibration.__call__` #376

Open jvavrek opened 1 year ago

jvavrek commented 1 year ago

I've run into this problem at least twice now where calibrating a spectrum with 32-bit int/uint channel values and a high-degree polynomial term silently returns inaccurate calibrated values. Below is a MWE comparing bq.Calibration.__call__, a numpy solution, and a manual calculation. At least on my machine, no warnings are produced unless I add np.float16 to the dtypes loop, even with warnings.simplefilter("always").

dtypes = [int, float, np.float32, np.uint64, np.uint32, np.int32]
_, ax = plt.subplots(len(dtypes), 1, sharex=True, figsize=(6, 9))
for i, dtype in enumerate(dtypes):
    x = np.arange(0, 16384, 1).astype(dtype)
    coeffs = np.array([1e+01, 1e-01, 0, 0, 1e-13])
    cal = bq.Calibration.from_polynomial(coeffs)
    y_bq = cal(x)
    y_np = np.sum([coeff * np.power(x, p) for p, coeff in enumerate(coeffs)], axis=0)
    y_manual = coeffs[0] + coeffs[1] * x + coeffs[2] * x * x + coeffs[3] * x * x * x + coeffs[4] * x * x * x * x

    plt.sca(ax[i])
    plt.plot(x, y_manual, label="manual", c="k")
    plt.plot(x, y_bq, label="becquerel", linestyle="dashed")
    plt.plot(x, y_np, label="numpy", linestyle="dotted")
    plt.legend(fontsize=11, frameon=False)
    plt.title(dtype, fontsize=14, y=0.65)
plt.xlabel("x")
plt.tight_layout()
plt.show()

image

Should we raise a CalibrationWarning if the input to __call__ is potentially too imprecise (np.uint32, np.int32, or lower)?

jvavrek commented 1 year ago

@markbandstra suggested that perhaps the 1e-13 coefficient was getting rounded to 0 under 32-bit precision. However, running with 1e-6 instead shows that the behavior persists, and so the precision loss is not quite so straightforward:

image

Note also that a uint32 should be able to represent 4e9 perfectly well, so it's not simply rollover in the result. It must be more subtle precision loss in the intermediate calculations.

The main problem we should fix is that the failure occurs silently. We should either (a) warn on 32-bit inputs (though float32 might be OK or (b) upcast the input to float64 automatically (and warn if we do so).