SimonHeybrock commented 3 years ago

Neutron Scattering Unit Conversion

Status

scippneutron.convert (mostly equivalent to mantid.ConvertUnits) does the following:

Compute new coord
Replace/remove old coord
Rename dimension associated with coord
Map coords to attrs

The current syntax is:

scn.convert(da, origin='tof', target='wavelength', scatter=True)

One thing to note here is that in addition to the coord specified using the origin arg, scn.convert finds additional coords containing position information.

Bigger picture

We also need to consider:

Conversions based on user parameters and functions, see https://docs.mantidproject.org/nightly/algorithms/ConvertDiffCal-v1.html
Gravity corrections.
SANS conversion to Q, which is completing by histogramming into Q bins and reducing over pixel dimensions
SANS conversion to Qxy, which is completing by histogramming into [Qx, Qy] bins and reducing over pixel dimensions
Reflectometry with a number of different cases
SofQW, which is another case of a "2d" transform
Single-crystal conversions, involving computation of $Q_x$, $Q_y$, $Q_z$ and binning into corresponding dimensions
Potential conversion to bin centers?
Reversing order in inelastic conversions?
Dealing with unphysical regions (inelastic conversions)
Introduction of multi-dim coord by unit conversions, which are sometimes hard to deal with

Related Mantid algorithms:

ConvertUnits
SofQW (this is a family of algorithms
Q1D and Qxy for SANS
Reflectometry?
Various ConvertToMD algorithms

Hypothesis

My current hunch is that there may be a better way to think of unit conversion mechanisms in a more modular manner. While the concrete decomposition of the problem is far from clear it might be that a 2-stage or multi-stage approach similar to groupby may be a good pattern:

da.groupby('two_theta').sum('pixel')

Questions

Are there performance or memory reasons for combining the various steps?

Brainstorming

# Should new coord be computed as separate step?
# scatter in 2 places?
da.coords['wavelength'] = scn.wavelength(da, gravity=True, scatter=True)
# Problem: would need to deal with events manually
da.bins.coords['wavelength'] = scn.wavelength(da.bins, gravity=True, scatter=True)

def wavelength(*, dim, gravity):
    def from_dspacing(*, dspacing, Ltotal, out=None):  # could out arg be used to do in-place convert?
        pass
    def from_tof(*, tof, Ltotal):
        pass
    def convert(func, **kwargs):
        return {'wavelength':func(**kwargs)}  # return dict so scn.convert knows name of new coord
    # must include tof, otherwise further ops prevented
    convert.coords_to_attrs = [] if scatter else [dim, 'position', 'Ltotal', 'source_position', 'sample_position']
    converters = {'tof':partial(convert, func=from_tof), ...}
    return converters[dim]

    Ltotal = lookup('Ltotal')  # walk tree, return coords required to compute it
    def from_tof(*, tof):
        # use tof and coords
        pass

# Two options:
# 1. teach transform_coords how to get/compute coords not in `da`
# 2. teach converter to only request coords that exist
da.transform_coords(converter=scn.wavelength('tof', 'Ltotal'=scn.SimpleGeometry(da)))
da.transform_coords(converter=scn.wavelength('tof', lookup=scn.GravityGeometry(da)))

da.transform_coords(converter=scn.wavelength('tof'), coord_lookup=scn.GravityGeometry)
da.transform_coords(converter=scn.wavelength('tof'), coord_lookup={'Ltotal':custom})
da.transform_coords(converter=sc.CoordConverter(sc.sqrt), coord_lookup={'Ltotal':custom})
# TODO consider xyz->spherical conversion as generic example

# if after conversion from A to B, A is kept:
# - conversion back to A should *not* assume that other coords unchanged (throw if coord exists?)
# - use coord requested by converter
# - same applies for conversion to C:
#   - converter=scn.wavelength('tof')  # use TOF coord, even if current dim is 'dspacing'
#   - converter=scn.wavelength('dspacing')  # do *not assume 'tof' is correct, compute it from 'dspacing' if required as intermediate step

converter = scn.wavelength(dim, gravity=True)  # this returns a *function*
converter = partial(scn.wavelength(dim), Ltotal=scn.Ltotal(gravity=True))

da = scn.convert(da, converter=scn.wavelength('tof', scatter=False), mode='steal')
da = scn.convert(da, origin='tof', target='wavelength', scatter=False, out=da)

def time_of_day(coord_name)
    def from_datetime(*, datetime, out=None):
        return datetime - midnight
    def convert(func, **kwargs):
        return {'time':func(**kwargs)}
    convert.coords_to_attrs = [coord_name]
    convert.inputs = [coord_name]
    return partial(convert, func=from_datetime)

da.coords['time'] = da.coords['datetime'] - midnight  # simple, why do we need the thing below?
da.rename_dims({'datetime':'time'})

# could be implemented in Python?
# only call to converter is expensive (use a C++ abstract class)
# ... or maybe the converter could also be Python, just functions C++?
da.transform_coords(converter=time_of_day(coord_name='datetime'))

scn.convert(da, converter=scn.wavelength('tof', scatter=False), rename_dim=False, mode='steal'|'keep'|'replace')
scn.convert(da, converter=scn.wavelength('tof', scatter=False), rename_dim=False, inplace=True)
scn.convert(da, converter=scn.wavelength('dspacing', scatter=False), rename_dim=False, inplace=True)
scn.convert(da, converter=scn.wavelength('tof', scatter=False), rename_dim=False)  # tof is removed; reason: coords are dependent on each other, no way to detect that one may be out of date
scn.convert(da, converter=scn.wavelength('tof', scatter=False), rename_dim=False, keep=['tof'])  # tof is kept
scn.convert(da, converters=[scn.qx('tof'), scn.qy('tof')])  # verbose, and extra work
scn.convert(da, converter=scn.qxy('tof'))  # concise and faster, qxy returns tuple; cannot rename dim

scn.make_coords(da, converter=scn.qxy('tof')).apply()  # no dim rename or attr mapping?
scn.make_coords(da, converter=scn.qxy('tof')).convert(rename_dim=False)  # second step does 3 and 4
scn.make_coords(da, converter=scn.qxy('tof')).convert(rename_dim=True)  # second step does 3 and 4

# Instead: adds dense and event coords
da = scn.compute_wavelength(da, gravity=True, scatter=True)
scn.convert(da, 'tof', 'wavelength', scatter=True) # rename dims, coord -> attrs

# should we always rename and do attr conv?
# keep original coord?
scn.abc(da, gravity=True, scatter=True).compute()
scn.abc(da, gravity=True, scatter=True).rename('wavelength')  # do attr conv?

# separate step (after attr conversion, reduction cannot replace attr conv, since not all attrs depend on pixel)
da.rebin(edges=wav_edges)
da.histogram(edges=Q_edges).sum('pixel')
da.histogram(edges=[Qx, Qy]).sum('pixel')  # > 100 GByte temporary
da.bin(edges=[Qx, Qy], erase=['pixel'])
da.bin(edges=[Qx, Qy, Qz], erase=['pixel'])  # SXD
da.bin(edges=[Qx, Qy, Qz, DeltaE], erase=['pixel'])  # inelastic
da.rebin_polygon(edges=[Q, DeltaE], erase=['pixel'])  # SofQW

scn.convert(da, coord='wavelength').rename_dims({'tof':'wavelength'})
scn.convert(da, coord='wavelength').rebin('wavelength')
scn.convert(da, coord='Q').histogram(edges='Q').sum('pixel')

# 1. Add new coord (depending on existing dims)
# 2. Replace/remove old coord
# 3. Rename dimension associated with coord | map dimension
# 4. Map coords to attrs | reduce

Conclusion

Add transform_coords to `scipp.
Can be Python-only, individual conversion steps can be implemented as C++ function with Python bindings if required for performance reasons.
scippneutron defined building blocks for conversion graphs, as well a common graphs. Graphs in https://scipp.github.io/scippneutron/user-guide/unit-conversions.html#Beamline-geometry-parameters-used-in-unit-conversions are the first example of such a graph.

SimonHeybrock commented 3 years ago

Next step here: Turn convert into a Python function, which simply calls sc.transform_coords. To do so:

Make https://github.com/scipp/scipp/issues/2064 is available.
Add conversion steps (this includes beamline helpers as well as the actual "unit conversions") implemented as Python functions. For cases where transform (or transform_in_place) is called, implement as C++ function and add Python bindings if it is required to avoid (large) temporaries.
Define graphs for the common conversions.
Turn convert into a wrapper for sc.transform_coords using the defined graphs.
Add deprecation warning in convert, recommend direct use of sc.transform_coords (don't do this until we have settled on a API for accessing/getting graphs).

Note that this should be merged into the next branch, since sc.transform_coords is not available in a release yet.

SimonHeybrock commented 3 years ago

See ongoing PRs.

scipp / scippneutron

Generalization of unit conversion #117

Neutron Scattering Unit Conversion

Status

Bigger picture

Hypothesis

Questions

Brainstorming

Conclusion