reflectivity / analysis

Data analysis for Neutron and X-ray Reflectometry
https://www.reflectometry.org/
Creative Commons Zero v1.0 Universal
0 stars 4 forks source link

Define some metric by which we can state a probability that two datasets come from the same underlying model #73

Open jfkcooper opened 1 year ago

jfkcooper commented 1 year ago

This may not strictly be part of the Analysis remit, but I know it would certainly be useful for it so I add the issue here.

If we have two datasets, both with error bars on every point (lets simplify our lives and at least initially assume the data points have the same x-values and that the error bars are solely in y), then we wish to know what the probability that the datasets come frrom the same underlying distribution? That is to say, if both were to be measured such that the error bars on all data points become infinitesimal, then the data points would all lie in the same place.

As far as I can understand this is a solved/trivial problem for a single dataset and a known distribution, but I am unaware of something which is able to deal with both the datasets having error bars, and the underlying distribution being unknown.

I can provide almost infinite datasets to test any theory through HOGBEN.

Thoughts on this problem and any potential solutions are most welcome.

pkienzle commented 1 year ago

You could take the difference between the two data sets, divide by the uncertainty (variances add), and test using: $$χ^2 = \sum_{k=1}^n (A_k - B_k)^2/(ΔA_k^2 + ΔB_k^2)$$ You can then reject equivalence at a given $p$-level by checking if the statistic is less than scipy.stats.chi2.ppf(1-p, df=n). The [Q, ΔQ] values need to be identical for both measurements. The measurement uncertainties are assumed to be independent and approximately Gaussian.

I'm using a one-tailed test here because the low side indicates that the data sets are closer together than expected, so even less likely to be drawn from different models. This will happen naturally a certain percentage of the time. If your uncertainties are overestimated, then the χ² statistic will be unexpectedly low and you will fail to reject datasets that are actually different.

This test will not be sensitive to correlations in the residuals. For example, if you were doing SANS measurements with an increasing structure factor peak, then this test would not notice that several consecutive residuals around the peak have the same sign. It will still reject the null hypothesis if the peak is big enough, but this will depend on the number of points outside the peak. Larger degrees of freedom allow for more extreme values in the peak.

Doing a quick test with data over a couple of orders of magnitude with uncertainty on the first curve at 2% and the second curve at 10% gives a distribution of values that matches the χ² pdf:

import numpy as np
from numpy.random import randn
from scipy.stats import chi2
from matplotlib.pyplot import hist, plot

n = 75
reps = 10000
yth = np.linspace(0.01, 100, n)
Δy1 = yth*0.02
Δy2 = yth*0.10
y1 = yth + randn(reps, n)*Δy1
y2 = yth + randn(reps, n)*Δy2
low, high = chi2.ppf([0.001, 0.999], df=n)
v = np.linspace(low, high, 200)
hist(np.sum((y2-y1)**2/(Δy1**2+Δy2**2), axis=1), bins=50, density=True)
plot(v, chi2.pdf(v, df=n))
image

Shifting all the data points in one curve by one standard deviation and most comparisons will reject the null hypothesis, but not all:

clf()
hist(np.sum((y2-y1+np.sqrt(Δy1**2++Δy2**2))**2/(Δy1**2++Δy2**2), axis=1), bins=50, density=True)
plot(v, chi2.pdf(v, df=n))
image

This is about equivalent to adding n=75 to the sum-squared residuals. Adding 5 SD to 3 points gives a similar graph.

arm61 commented 1 year ago

My first guess was to use a Z-test, which seems to be what @pkienzle is suggesting.

pkienzle commented 1 year ago

@arm61 I'm not sure how you would use the Z-test in this case. Maybe check that the mean of the normalized residuals is zero? This would be less sensitive than χ² to large deviations, and symmetric deviations (such as you would get by shifting Kiessig fringes) may even leave the mean unchanged.

Given that a shift in the fringes will lead to a bimodal distribution in the residuals when it is large enough, I tried to improve the sensitivity of the χ² test by also testing that the residuals follow a standard normal. Using a shifted sine wave with the Anderson-Darling test it does indeed reduce false positives but it also increases false negatives when there is no shift. In this simple example the χ² test alone is a better choice. A normality test may also be affected by any non-normality in our measurement uncertainty so I would not recommend using it.

arm61 commented 1 year ago

I was referring to http://homework.uoregon.edu/pub/class/es202/ztest.html, which has the same equation as you provide above.