scipp / scippneutron

Neutron scattering toolkit built using scipp for Data Reduction. Not facility or instrument specific.
https://scipp.github.io/scippneutron/
BSD 3-Clause "New" or "Revised" License
4 stars 3 forks source link

Generalization of unit conversion #117

Closed SimonHeybrock closed 3 years ago

SimonHeybrock commented 3 years ago

Neutron Scattering Unit Conversion

Status

scippneutron.convert (mostly equivalent to mantid.ConvertUnits) does the following:

  1. Compute new coord
  2. Replace/remove old coord
  3. Rename dimension associated with coord
  4. Map coords to attrs

The current syntax is:

scn.convert(da, origin='tof', target='wavelength', scatter=True)

One thing to note here is that in addition to the coord specified using the origin arg, scn.convert finds additional coords containing position information.

Bigger picture

We also need to consider:

Related Mantid algorithms:

Hypothesis

My current hunch is that there may be a better way to think of unit conversion mechanisms in a more modular manner. While the concrete decomposition of the problem is far from clear it might be that a 2-stage or multi-stage approach similar to groupby may be a good pattern:

da.groupby('two_theta').sum('pixel')

Questions

Brainstorming

# Should new coord be computed as separate step?
# scatter in 2 places?
da.coords['wavelength'] = scn.wavelength(da, gravity=True, scatter=True)
# Problem: would need to deal with events manually
da.bins.coords['wavelength'] = scn.wavelength(da.bins, gravity=True, scatter=True)

def wavelength(*, dim, gravity):
    def from_dspacing(*, dspacing, Ltotal, out=None):  # could out arg be used to do in-place convert?
        pass
    def from_tof(*, tof, Ltotal):
        pass
    def convert(func, **kwargs):
        return {'wavelength':func(**kwargs)}  # return dict so scn.convert knows name of new coord
    # must include tof, otherwise further ops prevented
    convert.coords_to_attrs = [] if scatter else [dim, 'position', 'Ltotal', 'source_position', 'sample_position']
    converters = {'tof':partial(convert, func=from_tof), ...}
    return converters[dim]

    Ltotal = lookup('Ltotal')  # walk tree, return coords required to compute it
    def from_tof(*, tof):
        # use tof and coords
        pass

# Two options:
# 1. teach transform_coords how to get/compute coords not in `da`
# 2. teach converter to only request coords that exist
da.transform_coords(converter=scn.wavelength('tof', 'Ltotal'=scn.SimpleGeometry(da)))
da.transform_coords(converter=scn.wavelength('tof', lookup=scn.GravityGeometry(da)))

da.transform_coords(converter=scn.wavelength('tof'), coord_lookup=scn.GravityGeometry)
da.transform_coords(converter=scn.wavelength('tof'), coord_lookup={'Ltotal':custom})
da.transform_coords(converter=sc.CoordConverter(sc.sqrt), coord_lookup={'Ltotal':custom})
# TODO consider xyz->spherical conversion as generic example

# if after conversion from A to B, A is kept:
# - conversion back to A should *not* assume that other coords unchanged (throw if coord exists?)
# - use coord requested by converter
# - same applies for conversion to C:
#   - converter=scn.wavelength('tof')  # use TOF coord, even if current dim is 'dspacing'
#   - converter=scn.wavelength('dspacing')  # do *not assume 'tof' is correct, compute it from 'dspacing' if required as intermediate step

converter = scn.wavelength(dim, gravity=True)  # this returns a *function*
converter = partial(scn.wavelength(dim), Ltotal=scn.Ltotal(gravity=True))

da = scn.convert(da, converter=scn.wavelength('tof', scatter=False), mode='steal')
da = scn.convert(da, origin='tof', target='wavelength', scatter=False, out=da)

def time_of_day(coord_name)
    def from_datetime(*, datetime, out=None):
        return datetime - midnight
    def convert(func, **kwargs):
        return {'time':func(**kwargs)}
    convert.coords_to_attrs = [coord_name]
    convert.inputs = [coord_name]
    return partial(convert, func=from_datetime)

da.coords['time'] = da.coords['datetime'] - midnight  # simple, why do we need the thing below?
da.rename_dims({'datetime':'time'})

# could be implemented in Python?
# only call to converter is expensive (use a C++ abstract class)
# ... or maybe the converter could also be Python, just functions C++?
da.transform_coords(converter=time_of_day(coord_name='datetime'))

scn.convert(da, converter=scn.wavelength('tof', scatter=False), rename_dim=False, mode='steal'|'keep'|'replace')
scn.convert(da, converter=scn.wavelength('tof', scatter=False), rename_dim=False, inplace=True)
scn.convert(da, converter=scn.wavelength('dspacing', scatter=False), rename_dim=False, inplace=True)
scn.convert(da, converter=scn.wavelength('tof', scatter=False), rename_dim=False)  # tof is removed; reason: coords are dependent on each other, no way to detect that one may be out of date
scn.convert(da, converter=scn.wavelength('tof', scatter=False), rename_dim=False, keep=['tof'])  # tof is kept
scn.convert(da, converters=[scn.qx('tof'), scn.qy('tof')])  # verbose, and extra work
scn.convert(da, converter=scn.qxy('tof'))  # concise and faster, qxy returns tuple; cannot rename dim

scn.make_coords(da, converter=scn.qxy('tof')).apply()  # no dim rename or attr mapping?
scn.make_coords(da, converter=scn.qxy('tof')).convert(rename_dim=False)  # second step does 3 and 4
scn.make_coords(da, converter=scn.qxy('tof')).convert(rename_dim=True)  # second step does 3 and 4

# Instead: adds dense and event coords
da = scn.compute_wavelength(da, gravity=True, scatter=True)
scn.convert(da, 'tof', 'wavelength', scatter=True) # rename dims, coord -> attrs

# should we always rename and do attr conv?
# keep original coord?
scn.abc(da, gravity=True, scatter=True).compute()
scn.abc(da, gravity=True, scatter=True).rename('wavelength')  # do attr conv?

# separate step (after attr conversion, reduction cannot replace attr conv, since not all attrs depend on pixel)
da.rebin(edges=wav_edges)
da.histogram(edges=Q_edges).sum('pixel')
da.histogram(edges=[Qx, Qy]).sum('pixel')  # > 100 GByte temporary
da.bin(edges=[Qx, Qy], erase=['pixel'])
da.bin(edges=[Qx, Qy, Qz], erase=['pixel'])  # SXD
da.bin(edges=[Qx, Qy, Qz, DeltaE], erase=['pixel'])  # inelastic
da.rebin_polygon(edges=[Q, DeltaE], erase=['pixel'])  # SofQW

scn.convert(da, coord='wavelength').rename_dims({'tof':'wavelength'})
scn.convert(da, coord='wavelength').rebin('wavelength')
scn.convert(da, coord='Q').histogram(edges='Q').sum('pixel')

# 1. Add new coord (depending on existing dims)
# 2. Replace/remove old coord
# 3. Rename dimension associated with coord | map dimension
# 4. Map coords to attrs | reduce

Conclusion

SimonHeybrock commented 3 years ago

Next step here: Turn convert into a Python function, which simply calls sc.transform_coords. To do so:

  1. Make https://github.com/scipp/scipp/issues/2064 is available.
  2. Add conversion steps (this includes beamline helpers as well as the actual "unit conversions") implemented as Python functions. For cases where transform (or transform_in_place) is called, implement as C++ function and add Python bindings if it is required to avoid (large) temporaries.
  3. Define graphs for the common conversions.
  4. Turn convert into a wrapper for sc.transform_coords using the defined graphs.
  5. Add deprecation warning in convert, recommend direct use of sc.transform_coords (don't do this until we have settled on a API for accessing/getting graphs).

Note that this should be merged into the next branch, since sc.transform_coords is not available in a release yet.

SimonHeybrock commented 3 years ago

See ongoing PRs.