nexusformat / definitions

Definitions of the NeXus Standard File Structure and Contents
https://manual.nexusformat.org/
Other
26 stars 56 forks source link

NXdata: improve documentation regarding axes #1213

Closed woutdenolf closed 10 months ago

woutdenolf commented 1 year ago

Closes #1212

In this PR we refactor the NXdata definition to resolve inconsistencies in definition of @axes, AXISNAME_indices and AXISNAME. NIAC discussions:

The proposed changes do not change anything to the current NXdata, they just clear up ambiguity:

Proper code examples with be added in a separate merge request: https://github.com/nexusformat/definitions/issues/1253

A python reader would have to do something like this to figure out what all the axes are and to which data dimensions they belong

from typing import Dict, List, Sequence
import h5py

def get_axes_dims(nxdata: h5py.Group) -> Dict[str,List[int]]:
    axes_data_dims = dict()
    axes = nxdata.attrs.get("axes", list())
    if isinstance(axes, str):
        axes = [axes]
    for axisname in set(axes):
        if axisname == ".":
            continue
        data_dims = nxdata.attrs.get(f"{axisname}_indices", None)
        if data_dims is None:
            data_dims = [i for i,s in enumerate(axes) if s == axisname]
        elif not isinstance(data_dims, Sequence):
            data_dims = [data_dims]
        else:
            data_dims = list(data_dims)
        axes_data_dims[axisname] = data_dims
    return axes_data_dims

Rendering of the PR

https://hdf5.gitlab-pages.esrf.fr/nexus/nxdata_axes/classes/base_classes/NXdata.html

phyy-nx commented 1 year ago

Discussion from Code Camp:

During code review we realized this is an extensive change. We have formed a subcommittee (@phyy-nx, @rayosborn, @woutdenolf, @sanbrock), to review the edited page and provide feedback here by Tuesday's set of sessions. Here is the edited page rendered in html by @woutdenolf

Note from @yayahjb, ensure this PR doesn't change functionality, just documentation.

woutdenolf commented 1 year ago

Discussion from Code Camp:

* Instead of replacing `dimension scale` with `coordinates`, use `axis coordinates`.

* "also referred to as signals or dependent variables", change to "sometimes referred to as signals or dependent variables", same with the next sentence.

Done.

rayosborn commented 1 year ago

I would suggest that the first paragraph doesn't need to describe the motivation for creating the NXdata group, which is of purely historical interest and might cause confusion when NXdata groups are encountered outside NXentry groups. In fact, NXdata groups can be used whenever data is to be plotted.

In the first paragraph, I would suggest the following:

"The NXdata class is designed to encapsulate all the information required for a set of data to be plotted. NXdata groups contain plottable data (sometimes referred to as signals or dependent variables) and their associated axis coordinates (sometimes referred to as axes or independent variables)."

rayosborn commented 1 year ago

I strongly disagree with the sentence stating that it "is not recommended to omit the AXISNAME_indices attributes ..." I am still waiting to see an actual example of a file where the rank of an axis field is greater than 1 - I mean a real-world example, not a hypothetical possibility. I think the following text conveys standard NeXus usage, and is less confusing to newcomers.

"The AXISNAME fields contain the coordinates associated with the data values. AXISNAME fields are typically one-dimensional arrays, which annotate one of the dimensions. However, the fields can also have a rank greater than 1, in which case the rank of each AXISNAME must be equal to the number of dimensions it spans.

The names of the AXISNAME fields to be used as axis coordinates are listed in the axes attribute. When all the AXISNAME fields are one-dimensional, their positions in the axes attribute correspond to the data dimension being annotated. If one of the data dimensions has no AXISNAME field, the string “.” can be used in the corresponding index of the axes list.

When the rank of any of the AXISNAME fields is greater than 1, the AXISNAME_indices attributes is used to define the data dimension(s) spanned by the corresponding AXISNAME fields. It is strongly discouraged to define the AXISNAME_indices attribute for some AXISNAME fields but not for others."

woutdenolf commented 1 year ago

I strongly disagree with the sentence stating that it "is not recommended to omit the AXISNAME_indices attributes ..."

I agree. For reference, that sentence came from the original:

@AXISNAME_indices: (optional) NX_INT ... This attribute is to be provided in all situations. However, if the indices attributes are missing (such as for data files written before this specification), file readers are encouraged to make their best efforts to plot the data.

So I had already softened that statement. But I agree with @rayosborn to soften it even more.

At the ESRF we are only now looking into supporting AXISNAME_indices because we have use cases for multi-dimensional axes (e.g. a stack of scatter plots ). Nevertheless it will remain a niche imo.

woutdenolf commented 1 year ago

@phyy-nx, @sanbrock if you agree I will apply @rayosborn's two proposals.

phyy-nx commented 1 year ago

Updates from next code camp section:

Also, add a simpler example of the basic case:

data:NXdata
    @signal = "data"
    @axes = ["x", "y"]      --> the order does matter
    data: float[10,20]      --> the default signal
    x: float[10]            --> coordinates along the first dimension
    y: float[20]            --> coordinates along the second dimension
woutdenolf commented 1 year ago

Changes:

woutdenolf commented 1 year ago

For reference, I took @rayosborn's description of axes https://github.com/nexusformat/definitions/pull/1213#issuecomment-1597849364

Axes:

The AXISNAME fields contain the coordinates associated with the data values. AXISNAME fields are typically one-dimensional arrays, which annotate one of the dimensions. However, the fields can also have a rank greater than 1, in which case the rank of each AXISNAME must be equal to the number of dimensions it spans.

The names of the AXISNAME fields to be used as axis coordinates are listed in the axes attribute. When all the AXISNAME fields are one-dimensional, their positions in the axes attribute correspond to the data dimension being annotated. If one of the data dimensions has no AXISNAME field, the string “.” can be used in the corresponding index of the axes list.

When the rank of any of the AXISNAME fields is greater than 1, the AXISNAME_indices attributes is used to define the data dimension(s) spanned by the corresponding AXISNAME fields. It is strongly discouraged to define the AXISNAME_indices attribute for some AXISNAME fields but not for others."

and merged it with my original wording to obtain this

Axes:

The AXISNAME fields contain the axis coordinates associated with the data values. The names of all AXISNAME fields are listed in the axes attribute.

Rank

AXISNAME fields are typically one-dimensional arrays, which annotate one of the dimensions. However, the fields can also have a rank greater than 1, in which case the rank of each AXISNAME must be equal to the number of data dimensions it spans.

Dimensions

The data dimensions annotated by an AXISNAME field are defined by the AXISNAME_indices attribute. When this attribute is missing, the position(s) of the AXISNAME string in the axes attribute are used.

When all AXISNAME fields are one-dimensional, and none of the data dimensions have more than one axis, the AXISNAME_indices attributes are often omitted. If one of the data dimensions has no AXISNAME field, the string “.” can be used in the corresponding index of the axes list.

When providing AXISNAME_indices attributes it is recommended to do it for all axes.

I mainly had an issue with this

When all the AXISNAME fields are one-dimensional, their positions in the axes attribute correspond to the data dimension being annotated. ... When the rank of any of the AXISNAME fields is greater than 1, the AXISNAME_indices attributes is used to define the data dimension(s) spanned by the corresponding AXISNAME fields.

This seems to imply the AXISNAME_indices are to be ignored when you only have rank 1 axes. This is logically very hard to implement mostly for HDF5 readers. In the merged version this becomes

The data dimensions annotated by an AXISNAME field are defined by the AXISNAME_indices attribute. When this attribute is missing, the position(s) of the AXISNAME string in the axes attribute are used.

This is logically much simpler. @rayosborn's description in spirit wants to say that most of our NXdata groups do not has AXISNAME_indices. So I added this paragraph

When all AXISNAME fields are one-dimensional, and none of the data dimensions have more than one axis, the AXISNAME_indices attributes are often omitted. If one of the data dimensions has no AXISNAME field, the string “.” can be used in the corresponding index of the axes list.

woutdenolf commented 1 year ago

To accommodate @phyy-nx who prefer positive wording I replaced

It is strongly discouraged to define the AXISNAME_indices attribute for some AXISNAME fields but not for others.

with

When providing AXISNAME_indices attributes it is recommended to do it for all axes.

woutdenolf commented 1 year ago

Split the Axes section into "simple" and "advanced" use cases

I did the splitting in another way with subsections rank and dimensions. The axes section should be easier to read now.

phyy-nx commented 1 year ago

Code review:

woutdenolf commented 1 year ago

Applied the requested modifications from https://github.com/nexusformat/definitions/pull/1213#issuecomment-1600934279: examples are merged with the doc sections and adapted to match those sections.

phyy-nx commented 11 months ago

From telco, let's get the same subcommittee mentioned above to re-review and approve the PR (@phyy-nx @woutdenolf @rayosborn @sanbrock). Further reviews are welcome. Target is November 2nd. Thanks!

phyy-nx commented 10 months ago

Ok! Based on Telcos and these approvals, @woutdenolf feel free to merge at your convenience. Thanks!

woutdenolf commented 10 months ago

Thanks for the review everyone!