saxs, histogram mode: errors are slightly under-estimated

dfranke76 commented 6 years ago

pyFAI in histogram mode slightly under-estimates the errors. The figure below was compiled from pair-wise comparisons of 1390 consecutive SAXS frames of water obtained at P12 at PETRA-III. The frames are similar up to noise as per cormap test as implemented in datcmp of ATSAS, therefore any deviations from the expected chi^2 distribution should be from the error estimates.

chi2-pval

Panel (a) shows the empirical distribution of chi^2 values of pyFAI in histogram mode, panel (b) the corresponding p-values. Note that the distribution in (a) is not centered on 1.0, but shifted slightly to the right (1.0171) and the distribution in (b) would be expected to be uniform. For reference, panel (c) shows the empirical histogram of chi^2 values of data averaged by radaver of ATSAS; panel (d) shows the corresponding, mostly uniform, p-values.

While the effect on the chi^2 values does not appear to be large in absolute numbers, the effect on the p-value distribution in (b) is severe: 3.3% of the chi^2 values have a p-value of 0.01 or less and 12.2% of the chi^2 values a p-value of 0.05. Expected would be 1% and 5%, respectively. In other words, if one uses the p-value as a decision criterion whether there are differences, the pyFAI results indicate differences where there are none two to three times as often as one would expect by chance. Again, for comparison, radaver has 1.2% and 5.6% significant results at the 1% and 5% levels, respectively.

vallsv commented 6 years ago

Hi. Thanks for this big feedback. I don't have the skill to comment anything here. There is nobody here right now. But you should have feed back in few weeks. Regards.

kif commented 6 years ago

Hi Daniel,

Thanks for the feedback. I believe this will be fixed with #520. Until now I never found a situation where this change actually provides better results. So thanks for the regression test provided.

Cheers,

kif commented 6 years ago

Hi Daniel, I wonder if the deviation you are noticing is not related to some correction applied by pyFAI which does not assume the saxs hypothesis (i.e pyFAI performs solid angle correction by default).

Could you specify what is in the plot (a)? does it correspond to the histogram of \chi^2 values calculated for every single pixel in the image, summing over the stack ? the "average value" \mu_i is calculated by pyFAI by azimuthal integration (with $\Omega$ correction) over all frames ?

$chi^2 formula$ @dfranke76

dfranke76 commented 6 years ago

Hi Jerome. Apologies for the delayed reply, bit busy over here.

What I did for panels (a) and (c): take a number of consecutive frames, any will do, eg. water. Run pyFAI on each. Then datcmp all pairwise frames with the chi^2 test (datcmp --test=chi-square *.dat). Create a histogram of the chi^2 values with your favourite tools. The implemented statistic in datcmp is:

c2 = sum((I_k - J_k)^2 / (SE(I_k)^2 + SE(J_k)^2)) / (n-1)

where I_k and J_k are the kth intensities of the two files, SE(I_K) and SE(J_k) the corresponding error estimates and 1<=k<=n.

kif commented 6 years ago

Hi Daniel,

No problem for the delay, I am in training those days and won't have time to look at the issue before next week. Could you try to deactivate the solid-angle correction (keyword: correctSolidAngle=False, polarization_factor=None) and re-perform the integration as I believe the "slightly off" may be due to those WAXS corrections applied by default, even for SAXS data.

Cheers,

Jerome

kif commented 6 years ago

@dfranke76 , Hi Daniel, I setup a numerical experiment simulating a SAXS experiment with a poissonnian detector. https://gist.github.com/kif/f5b5407e292fa209c05675a3d694b943 Could you have a look at this gist and validate the approach when there is no pixel splitting. Cheers

Jerome

dfranke76 commented 6 years ago

Could you try to deactivate the solid-angle correction (keyword: correctSolidAngle=False, polarization_factor=None) and re-perform the integration as I believe the "slightly off" may be due to those WAXS corrections applied by default, even for SAXS data.

Could please be more specific on where/how to turn this off? On calibration, in the .poni, in the sources, ...? Thanks.

dfranke76 commented 6 years ago

I setup a numerical experiment simulating a SAXS experiment with a poissonnian detector. Could you have a look at this gist and validate the approach when there is no pixel splitting.

I'm not fluent in the Python packages used and can only follow the basics from scanning through.

The simulation is the tricky bit and, given some recent experiences, I'd be wary about the direct Poisson simulation - I've observed some odd artifacts not present in real data.

In Figure 13, you may use the following snippet to compute the 99% confidence interval (set m to the actual number of points in your simulated 1d data): >>> from scipy.stats import chi2 >>> m = 1000 >>> chi2.ppf([0.005, 0.995], m) / (m - 1) [ 0.88945298, 1.12006813 ] This means, for nimg=1000, you have (nimg-1) x nimg / 2 = 499500 comparisons, therefore only 499500 x 0.005 ~= 2500 should be less than the lower limit and only ~2500 should be above the upper limit. Alternatively, calculate the p-values of all 499500 comparisons and verify that the histogram of p-values is uniform.

In Figure 14, why do you fit a Gaussian to the histogram of chi-square values? Try an actual chi-square distribution with the corresponding degrees of freedom (i.e. m-1 from before) instead.

Does this help?

kif commented 6 years ago

Hi Daniel,

I have been convinced to use synthetic data to avoid experimental artefacts. Getting actual poissonian data is not that simple, even with Pilatus detectors due to hardware flatfield correction and interpolated inter-module pixels.

Concerning the number of outliers, I got 2573 (below) and 2498 (above) ... or (2496, 2475) on another run. This looks in pretty good agreement with the theory.

About Fig14, I was willing to calculate precisely the center and fitted simply any "bell-shaped curve" on it.

Thanks for your suggestion about using the scipy.stat distribution. Apparently it fits pretty well. https://github.com/kif/pyFAI-tutorials/blob/master/tutorials/Variance/Variance.ipynb you can also edit it on: https://hub.mybinder.org/user/kif-pyfai-tutorials-10py7acn/notebooks/tutorials/Variance/Variance.ipynb but apparently it requires too much memory to be used.

It allows me to validates the normalization for external parameters like solid angle in the case where there is no pixel splitting.

dfranke76 commented 6 years ago

I have to ask: what are "interpolated inter-module pixels"? We have plenty of Pilatus detectors around and I've never seen anything like that. Module gaps get masked, and that's it?! Note: if necessary, the internal flat-field correction can be disabled in camserver (something like "LdFlatField ''").

kif commented 6 years ago

On Mon, 24 Sep 2018 08:00:36 -0700 Daniel Franke notifications@github.com wrote:

I have to ask: what are "interpolated inter-module pixels"? We have plenty of Pilatus detectors around and I've never seen anything like that. Module gaps get masked, and that's it?!

Modules gap is something that we mask as well. Pilatus modules (100kpix) are made of sub-modules: to quote their manual: https://public.bnl.gov/sites/rep/Equipment%20Media/User_Manual-PILATUS2-V1_4.pdf?Mobile=1

The fundamental unit of the DECTRIS detectors, the module, consists of a single fully depleted monolithic silicon sensor with an 8 x 2 array of CMOS readout chips bump-bonded to it. Each sensor is a continuous array of 487 x 197 = 94965 pixels without dead areas and covers an active area of 83.8 x 33.5 mm 2 . The readout chips are wire-bonded to the underlying print which is glued to the mounting bracket. Together with its readout control electronics the sensor with readout chips forms the complete module.

According to: http://photon-science.desy.de/research/technical_groups/detectors/e190302/e199162/CharacterizationandCalibrationofPilatus.pdf Each chip (or sub-module) is composed on 60x97 actual pixel. So if you count well, 3 pixels are missing out of the 197 and 7 are missing out of the 487. It is known those pixels are interpolated from their neighbooring pixels.

This is the theory. To visualize them, you can take a stack of your data of shape (nimg, pix_y, pix_x). The ideal is to take a good scatterer or better, the flurescent of a heavy element as secondary source.

Now you can calculate the standard deviation over the stack (same size as an image) and the median value (also along the stack, same size). The ratio of the std squared with the median should be 1 in the case of a poisonian detector. But some pixels have half the standard deviation compared to the other because they are interpolated. The are nicely layed out to form a 8x2 grid :)

Note: if necessary, the internal flat-field correction can be disabled in camserver (something like "LdFlatField ''").

I know ... beside this, old Pilatus are performing funky correction for cosmic ray removal but this fails in certain cases, creating nasty artefacts like donut-shaped zinger (bug corrected in Pilatus-3).

Cheers,

Jérôme

kif commented 5 years ago

The code for performing the propagation of signal, variance and normalization is in place more or less everywhere in pyFAI now. We will release the v0.18 soon and contains those features; even if they are activated only in the _integrate1d_ng function for compatibility issue. Indeed there will be a release dedicated to those issues, probably the 0.19 or 0.20 as it breaks the compatibility with existing code.

silx-kit / pyFAI

saxs, histogram mode: errors are slightly under-estimated #882