NXdata and scatter plots

woutdenolf commented 1 year ago

The issue

At the ESRF we store all multi-dimensional scans in a flat way. For example suppose you do a 2D mesh scan of 25 x 30 points, then our data looks like this (example for 2 positioners and 3 detectors but it can be n positioners and m detectors in general):

  x: float[750]
  x@units: "um"
  y: float[750]
  y@units: "um"
  diode: float[750]
  mca: float[750, 1024]
  image: float[750, 1024, 1024]

For 0D detectors we make NXdata groups like this (see plots bellow to understand what the intention is).

2dscan:nxdata
  @axes: ["x", "y"]
  @signal: "diode"
  x: float[750]
  x@units: "um"
  y: float[750]
  y@units: "um"
  diode: float[750]

However this is NOT a valid NXdata group right because the number of axes (2) is not equal to the number of signal dimensions (1).

The question

So it seems the only valid way to store this data in an NXdata is this?

2dscan:nxdata
  @axes: ["x"]
  @signal: "diode"
  x: float[750]
  x@units: "um"
  x_indices: 0
  y: float[750]
  y@units: "um"
  y_indices: 0
  diode: float[750]

This is not useful because this means a reader needs to look at all field names to understand that the first and only dimension has two axes associated to it. And even if a reader does this, it cannot understand that the axes span different plot dimensions instead of the same plot dimension (where x and y are alternative coordinates for the same single dimension).

Note that storing the data like this doesn't help either

2dscan:nxdata
  @axes: ["x"]
  @signal: "diode"
  x: float[25, 30]
  x@units: "um"
  x_indices: [0, 1]
  y: float[25, 30]
  y@units: "um"
  y_indices: [0, 1]
  diode: float[25, 30]

Of course this works, but it only works if the x-y coordinates form a perfectly regular grid (which is never the case for measured data)

2dscan:nxdata
  @axes: ["x", "y"]
  @signal: "diode"
  x: float[25]
  x@units: "um"
  x_indices: 0
  y: float[30]
  y@units: "um"
  y_indices: 0
  diode: float[25, 30]

Silx scatter plot visualizations

Currently silx supports NXdata groups like this (not NeXus compliant)

2dscan:nxdata
  @axes: ["fastaxis", "slowaxis"]
  @signal: "diode"
  fastaxis: float[2800]
  slowaxis: float[2800]
  diode: float[2800]

which are shown as a scatter plot

or resampled on-the-fly on a regular grid

Generate data

import numpy
import h5py
import imageio
from numpy.random import uniform
from scipy import ndimage

rgb = imageio.imread("https://github.com/scikit-image/scikit-image/raw/ce707744e84e631aa9e014559051cb123f7a65ce/skimage/data/ihc.png")
r, g, b = rgb.T
im = 0.2989 * r + 0.5870 * g + 0.1140 * b

# Sampling coordinates
d = -3  # random motor deviations
x0 = numpy.linspace(2*d, im.shape[0]-1-2*d, 70)
x1 = numpy.linspace(2*d, im.shape[1]-1-2*d, 40)
fastaxis, slowaxis = numpy.meshgrid(x0, x1, indexing='xy')
fastaxis = fastaxis.flatten()
slowaxis = slowaxis.flatten()
fastaxis += uniform(low=-d, high=d, size=fastaxis.size)
slowaxis += uniform(low=-d, high=d, size=slowaxis.size)

# Image sampling
z = ndimage.map_coordinates(im, [fastaxis, slowaxis], order=1, cval=numpy.nan)

# Save as NXdata
with h5py.File("test.h5", mode="w") as root:
    scan = root.create_group("scan1")
    data = scan.create_group("data")
    root.attrs["NX_class"] = "NXroot"
    scan.attrs["NX_class"] = "NXentry"
    data.attrs["NX_class"] = "NXdata"
    root.attrs["default"] = "scan1"
    scan.attrs["default"] = "data"
    data.attrs["axes"] = ["fastaxis", "slowaxis"]
    data.attrs["signal"] = "data"
    data["slowaxis"] = slowaxis
    data["fastaxis"] = fastaxis
    data["data"] = z

PeterC-DLS commented 1 year ago

A need for reshaping flattened scans is definitely useful. Maybe a quantizatized scan/position N-D grid and sequence will meet the requirements. How do we formulate this?

prjemian commented 1 year ago

Isn't a flattened scan a set of arrays, not likely to be monotonic in some of the independent variables (dimension scales).

prjemian commented 1 year ago

APS uses the EPICS sscan record for many of its scanning operations. There is an associated data collector process that saves the data into a custom MDA format (based on xdr). We want to translate from MDA to NeXus, assuming as little processing as necessary.

The data from this data acquisition can be presented in the MDA file in ways that are most easily described in NeXus using the _indices notation. For example, a 2-D point-wise (raster) scan may not be represented on a regular 2-D grid due to repositioning of the fast axis for each increment of the slow axis (row-by-row, the columns are not at identical values of the independent positioner because the position readback sensor value does not always match the intended position). While the intended positions are on a regular grid, it is not reliable to assume the readbacks are on a regular grid, yet it is the readbacks we wish to use for visualization.

prjemian commented 1 year ago

The wording in the @AXISNAME_indices section is inconsistent, as noted above. The inconsistency represents the lack of update to existing wording as new features were added to NXdata, such as regards the @AXISNAME_indices attribute. This attribute was added to the standard by an approving NIAC vote and should take similar to deprecate. Changing the wording does not require such a vote. The documentation says:

model of “strict writer, liberal reader”

The documentation should be revised with this in mind. This writing is confusing and a likely motivation for this issue in particular:

This attribute is to be provided in all situations. However, if the indices attributes are missing ...

Written as a committee decision, this looks confusing when viewed at this time. Please help in clarifying the documentation.

nexusformat / definitions