linear contribution to fit when some points in the scan have zero counts

paulscherrerinstitute / pyzebra

GNU General Public License v3.0

3 stars 2 forks source link

linear contribution to fit when some points in the scan have zero counts #47

Closed rsibille-psi closed 2 years ago

rsibille-psi commented 2 years ago

There is a bug with the fitting of scans when at least one of the points in the scan has zero counts.

For instance, open file 8711 (experiment 20210974), do process all, and have a look at the fit for scan 85 (hkl = 5 5 2). You can see that the background is blocked at an arbitrary and low value that is impossible to get tuned to a different value even when imposing a larger starting value to the linear part of the function.

ivan-usov commented 2 years ago

Yes, looks like the current way to calculate weights for fitting makes that 0-count point "too heavy". What we decided in the past is to assign weight=1 to such points, while others have weight=1/y_err: weights = [1 / y_err if y_err != 0 else 1 for y_err in y_err]

Clearly, right now changing the Monitor value also will affect relative weighting of 0-count points, which doesn't sound correct, so the current method needs a revision. One way is to go set weight=0, but then it might overestimate a background amplitude when it is relatively low (many points will be 0-count and be excluded from fitting). Not using weights at all is another option. Do you know how it is done in your previous software?

markus-zolliker commented 2 years ago

When the y is a bare count rate, y_err is estimated as sqrt(y), but for 0 counts, this is certainly not true, because the error can not be 0. y_err = 1 is then a better, but very rough guess. This is used in the FIT program.

In principle, a more correct way would be to get the errors from the fit function instead from the data points.

In a least-square fit, the function to be minimised is

SUM( ((y[i] - fun(x[i]) / y_err[i]) ** 2)

with

y_err[i] = sqrt(y[i])

in principle it would be more correct to use

y_err[i] = sqrt(fun(x[i]))

instead. However, least-square fit libraries probably do not support this ...

ivan-usov commented 2 years ago

Yes, I agree that the error can not be 0, and probably what we are missing is the instrumental error. However, I don't know if it's available for the instruments at zebra. Furthermore, some measurement could be simply outliers and should be rather discarded from data analysis.

Concerning getting errors from the fit function, the problem is that we use those errors to estimate weights for fitting itself, thus we need to know them prior to fitting. Or did I misunderstood your suggestion?

markus-zolliker commented 2 years ago

The available instrumental error of all neutron instruments is the statistical error of counting, which is estimated by the square root of counts

ccl_io.py, line 173:

s["counts_err"] = np.sqrt(s["counts"])

However, for zero counts this does not work, so you should do something like:

s["counts_err"] = np.sqrt(np.maximum(s["counts"], 1))

What I wrote about getting the error as y_err[i] = sqrt(fun(x[i])), is not easily practical. So you probably want to stick with above solution.

Thinking about it a little more, an idea to get a better estimation of the error of zero counts might be to look at the neighbourhood. If you have in N subsequent zeros in a row, a better estimate for the error of each of these points would be 1/sqrt(N + 1). Why? With an average count rate of 1/M we would get in average (M - 1) times a 0 and one times a 1. (M = N + 1).

ivan-usov commented 2 years ago

OK, this looks like a very good change to assign an error of 1 to zero-count points from the beginning (this will also correctly scale the error in case of Monitor value changes, and will not require a workaround for fitting weights calculation).

Testing it out, seems it solves the issue, because the Monitor value was adjusted for the troubling run 85, but the weight was calculated assuming the error is still equal to 1.

ivan-usov commented 2 years ago

The fix is deployed on the test server.