Consider approximations to Gaussian resolution calculation

reflectometry / refl1d

1-D reflectometry fitting

https://refl1d.readthedocs.io/

BSD 3-Clause "New" or "Revised" License

18 stars 24 forks source link

Consider approximations to Gaussian resolution calculation #136

Open bmaranville opened 3 years ago

bmaranville commented 3 years ago

For some problems the convolve_gaussian is taking the majority of the compute time, more than the reflectivity calculation. The convolve_uniform is at least 20 times faster - should we consider an approximation to gaussian blur such as repeated box blurs?
See e.g. http://nghiaho.com/?p=1159 and http://blog.ivank.net/fastest-gaussian-blur.html and the SVG implementation: https://www.w3.org/TR/SVG11/filters.html#feGaussianBlurElement

pkienzle commented 3 years ago

Or precompute the weights. See https://github.com/reflectometry/refl1d/pull/131#issuecomment-898811724.

bmaranville commented 3 years ago

The precomputed weights route seems promising, but also comes with caveats - what if the dQ values change because of theta offset? What if the set of Q points changes because a relative theta_offset changes?

pkienzle commented 3 years ago

Theta offset is not a problem. Within the accuracy of our resolution calculation all the q values change by the same amount.

Changes to ΔQ are more problematic. You need to recompute the weights for each Q or you need to evaluate at I(Q') = I(Q + (Q-Q₀)(ΔQ'/ΔQ - 1)). Since we are sharing the calculation points across all Q₀ we are not free to move them, so maybe some kind of interpolation will work at least for small changes to ΔQ. I'm not sure how expensive that would be, or if it can just be another matrix multiply.

pkienzle commented 3 years ago

Using the numerical derivative and expanding out the convolution you will end up with Σ wk I(Qk) ⇒ Σ wk' I(Qk) where wk' = wk + [(Qk - Q₀)/(Qk - Q{k-1}) - (Q{k+1} - Q₀)/(Q{k+1} - Qk)]/ΔQ ⋅ (ΔQ' - ΔQ) for each Q₀ in your measurement (needs checking!)

This can be expressed using sparse matrix operations as G⊛I ≈ (W + DΜ) ⋅ I_calc with weight matrix W and variation matrix M both precomputed, and diagonal matrix D = [ΔQk'-ΔQk] updated whenever Δθ changes.

pkienzle commented 3 years ago

The repeated box convolution is interesting since it is closer to what we are measuring. Each slit is effectively a box filter of width Δθ. However, it's not clear to me how to apply these filters when the spacing is not uniform and each point has a different resolution. You can't run three passes updating I(Qcalc) each time because then neighbouring convolutions will start to interfere with each other. Think of the box filter as a matrix operator W on Icalc, applying the filter three times is equivalent to applying W³Icalc. Clearly there will be some cross talk between the rows of the matrix W³.

Expanding on this weight matrix approach we can (conceptually) compute W³ⱼ for each ΔQⱼ in the data, then form a new matrix W consisting of the row W³ⱼ corresponding to measured I(Qⱼ). Obviously expensive to form that many full matrices, but maybe we can generate the rows algorithmically, while being cheaper than computing a bunch of erf and exp functions. If you can do this "on-the-fly" then can modify the inner loop of the uniform distribution to compute the repeated uniform.

It may be that constructing W³ⱼ requires the equivalent of running the filter three times. You can still do this on the fly with a couple of work arrays. The first pass runs the box filter over twice the window width on the original data, storing it in a work array, the second pass runs over the window width storing in a second work array and the third pass is a simple sum-product over the final work array to arrive at the target G⊛I. Total cost for a fixed window of length k is O(n⋅(3k² + k)) compared to Ο(n⋅k) for the current algorithm, which is to say that I expect it to be a lot slower.

pkienzle commented 3 years ago

The W + DM form gets increasingly bad at the tails since the interpolation point δQ = Q'-Q scales with distance Q-Q₀. For larger changes in ΔQ you may end up with Q' outside the original [Qk, Q{k+1}] interval.

We would do better by moving to the next interval for our estimate. So instead of doing a strict sparse multiply, we can roll our own, with the Q' values interpolated into I_calc using a "stretch factor" ΔQ'/ΔQ giving Q' = Q₀ + (Q - Q₀)⋅ΔQ'/ΔQ. After the one-time cost of building the weight matrix this will be about as fast as the uniform convolution.

Going one step further, we can avoid the expensive gaussian calculations almost entirely by computing the weights for a standard gaussian with k points then use the interpolation approach to stretch this into whatever Q values we happened to have calculated. As a result, we will no longer be privileging the nominal Q, ΔQ: all choices of theta offset and sample broadening would be equally good/bad.

To summarize:

gaussian matrix form: (W + DM) I with precomputed W, M
interpolated matrix form: W I' with precomputed W and interpolated I'
standard normal form: G ⋅ I' with G from standard normal and interpolated I'
uniform matrix form: (W + DM) I with W precomputed from box approximation
on-the-fly uniform: B ⋅ I with B weights computed directly from Q_calc and ΔQ'
uniform filter: apply uniform filter to I three times using work arrays

I'm inclined toward (2) as the most practical.

andyfaff commented 3 years ago

I find that Gaussian quadrature works well in terms of performance.

Pros

everything can be vectorised
time taken scales linearly with the quadrature order; the resolution calculation needs a factor of quad_order more reflectivity calculations per point. The majority of time in the convolution is spent doing this, the array multiplications are much less expensive. I believe that the calculation time in refl1d already scales linearly with the degree of oversampling.
the weighting values only need to be calculated once and can be cached.

Cons

might be subject to aliasing effects depending the fringe spacing.

_INT_LIMIT = 3.0

@lru_cache(maxsize=128)
def gauss_legendre(n):
    """
    Calculate gaussian quadrature abscissae and weights

    Parameters
    ----------
    n : int
        Gaussian quadrature order.
    Returns
    -------
    (x, w) : tuple
        The abscissae and weights for Gauss Legendre integration.
    """
    return scipy.special.p_roots(n)

def _smeared_abeles_pointwise(qvals, w, dqvals, quad_order=17):
    """
    Resolution smearing that uses fixed order Gaussian quadrature integration
    for the convolution.

    Parameters
    ----------
    qvals : array-like
        The Q values for evaluation
    w : array-like
        The uniform slab model parameters in 'layer' form.
    dqvals : array-like
        dQ values corresponding to each value in `qvals`.
    quad-order : int, optional
        Specify the order of the Gaussian quadrature integration for the
        convolution.

    Returns
    -------
    reflectivity : np.ndarray
        The smeared reflectivity
    """

    # get the gauss-legendre weights and abscissae
    abscissa, weights = gauss_legendre(quad_order)

    # get the normal distribution at that point
    prefactor = 1.0 / np.sqrt(2 * np.pi)

    def gauss(x):
        return np.exp(-0.5 * x * x)

    gaussvals = prefactor * gauss(abscissa * _INTLIMIT)

    # integration between -3.5 and 3.5 sigma
    va = qvals - _INTLIMIT * dqvals
    vb = qvals + _INTLIMIT * dqvals

    va = va[:, np.newaxis]
    vb = vb[:, np.newaxis]

    qvals_for_res = (np.atleast_2d(abscissa) * (vb - va) + vb + va) / 2.0
    smeared_rvals = abeles(qvals_for_res, w)

    smeared_rvals = np.reshape(smeared_rvals, (qvals.size, abscissa.size))

    smeared_rvals *= np.atleast_2d(gaussvals * weights)
    return np.sum(smeared_rvals, 1) * _INTLIMIT

pkienzle commented 3 years ago

Doesn't this increase the cost of the reflectivity calculation by 17x? You can no longer reuse the q from one point to compute the resolution for the surrounding points. How much worse is the quadrature calculation if you use the measured q values plus some random q values in between? You can still precompute the weights (assuming θ, Δθ don't change too much) so you only have to pay for it once when you load the data.

pkienzle commented 3 years ago

@andyfaff, a separate question: are you sure you want to go to 3.5 σ?

For sasmodels I wrote this:

# According to (Barker & Pedersen 1995 JAC), 2.5 sigma is a good limit.
# According to simulations with github.com:scattering/sansresolution.git
# it is better to use asymmetric bounds (2.5, 3.0)
PINHOLE_N_SIGMA = (2.5, 3.0)

I've also done simulations for reflectometers (github.com:pkienzle/candor.git) that show similar behaviour. At least for continuous sources our resolution function is approximately trapezoidal for Δθ and closer to uniform for Δλ, so pulling from far in the tails is contraindicated, especially when convolving with a function that is changing by an order of magnitude over a short q range.

Presumably the situation is different for TOF, with much longer tails in Δλ. Any sense of how good the gaussian approximation is for your instruments?

andyfaff commented 3 years ago

Yes, it increases the cost of the reflectivity calculation by a factor of quad_order.

However, with the way that we've been operating our TOF-NR there may only be 1 point on either side that is within 1-sigma, 2 within 2-sigma (for the random measurement I selected). This is down to TOF rebinning.

I've not used gaussian quadrature with points that aren't the normal Gauss Legendre points before. You'd certainly want to cache those weights. But I don't think the gain from adding those e.g. 4 points won't be significant enough.

There is an even faster way of convolving that refnx uses, depending on the exact simulation system (it uses less reflectivity calculations than a typical pointwise calc). The dq/q has to be constant. This allows one to calculate a geometrically spaced theoretical model (at higher point density than the data), FFT convolve with a Gaussian distribution, then interpolate back to the data points.

andyfaff commented 3 years ago

Empirically 3.5-sigma seems to work. The gauss quadrature rules do site less points at the periphery. There will be situations where higher quad_order will be required.

I published a paper back in 2013 on resolution functions for NR-TOF, https://doi.org/10.1107/S0021889813021936. Depending on the operating conditions the resolution function can vary from gaussian to trapezoidal.

For a lot of measurements we measure with a dlambda/lambda approximately equal to angular (both 3.3%), which means the resolution function is more Gaussian. I reckon the Gaussian approximation is fine for a large majority of experiments.

However, we can also relax the uniform d_{lambda}/lambda to approx 8% to get lots of flux onto the sample. However, we can't relax the (trapezoidal) angular resolution much past 3%. This makes the resolution function trapezoidal, but with significant wings (there are several wavelength contributions). It has the largest effect for thick samples where: the Kiessig fringe is approx the same width as the resolution function, around critical edges, and for Bragg peaks. Another situation is where you have a very diffuse interface. E.g. a polymer brush in solvent has a slowly decaying concentration to its periphery. In D2O this is a slowly increasing SLD to the bulk. This means that above Qc the signal drops precipitously, similar to a rough surface. This is modelled in a better fashion for Q~Qc by a bespoke resolution kernel for each point.

I'm very interested in writing a resolution kernel (Q, p(Q)) for each Q point into reduced datafiles so these analyses can become more common. canSAS is just about to go down that route; it's on ORSO's horizon, but we're not there yet. It'll be essential for ESS.

The 'exact' kernels modelled in the paper I mentioned are quick to numerically calculate during reduction. A few years ago I checked they were correct with a Monte Carlo experimental simulator that I developed for TOF-NR, https://github.com/refnx/refnx-models/blob/master/platypus-simulate/tof_simulator.py; the agreement was remarkable.

I'm always interested in having deeper conversations in all things resolution concerned. I think it would be good for ORSO to have more discussions on these lines (e.g. about the 3.5-sigma vs 3-sigma, etc)