Closed woutdenolf closed 10 months ago
Discussion from Code Camp:
dimension scale
with coordinates
, use axis coordinates
.During code review we realized this is an extensive change. We have formed a subcommittee (@phyy-nx, @rayosborn, @woutdenolf, @sanbrock), to review the edited page and provide feedback here by Tuesday's set of sessions. Here is the edited page rendered in html by @woutdenolf
Note from @yayahjb, ensure this PR doesn't change functionality, just documentation.
Discussion from Code Camp:
* Instead of replacing `dimension scale` with `coordinates`, use `axis coordinates`. * "also referred to as signals or dependent variables", change to "sometimes referred to as signals or dependent variables", same with the next sentence.
Done.
I would suggest that the first paragraph doesn't need to describe the motivation for creating the NXdata group, which is of purely historical interest and might cause confusion when NXdata groups are encountered outside NXentry groups. In fact, NXdata groups can be used whenever data is to be plotted.
In the first paragraph, I would suggest the following:
"The NXdata class is designed to encapsulate all the information required for a set of data to be plotted. NXdata groups contain plottable data (sometimes referred to as signals or dependent variables) and their associated axis coordinates (sometimes referred to as axes or independent variables)."
I strongly disagree with the sentence stating that it "is not recommended to omit the AXISNAME_indices attributes ..." I am still waiting to see an actual example of a file where the rank of an axis field is greater than 1 - I mean a real-world example, not a hypothetical possibility. I think the following text conveys standard NeXus usage, and is less confusing to newcomers.
"The AXISNAME fields contain the coordinates associated with the data values. AXISNAME fields are typically one-dimensional arrays, which annotate one of the dimensions. However, the fields can also have a rank greater than 1, in which case the rank of each AXISNAME must be equal to the number of dimensions it spans.
The names of the AXISNAME fields to be used as axis coordinates are listed in the axes attribute. When all the AXISNAME fields are one-dimensional, their positions in the axes attribute correspond to the data dimension being annotated. If one of the data dimensions has no AXISNAME field, the string “.” can be used in the corresponding index of the axes list.
When the rank of any of the AXISNAME fields is greater than 1, the AXISNAME_indices attributes is used to define the data dimension(s) spanned by the corresponding AXISNAME fields. It is strongly discouraged to define the AXISNAME_indices attribute for some AXISNAME fields but not for others."
I strongly disagree with the sentence stating that it "is not recommended to omit the AXISNAME_indices attributes ..."
I agree. For reference, that sentence came from the original:
@AXISNAME_indices: (optional) NX_INT ... This attribute is to be provided in all situations. However, if the indices attributes are missing (such as for data files written before this specification), file readers are encouraged to make their best efforts to plot the data.
So I had already softened that statement. But I agree with @rayosborn to soften it even more.
At the ESRF we are only now looking into supporting AXISNAME_indices
because we have use cases for multi-dimensional axes (e.g. a stack of scatter plots ). Nevertheless it will remain a niche imo.
@phyy-nx, @sanbrock if you agree I will apply @rayosborn's two proposals.
Updates from next code camp section:
Also, add a simpler example of the basic case:
data:NXdata
@signal = "data"
@axes = ["x", "y"] --> the order does matter
data: float[10,20] --> the default signal
x: float[10] --> coordinates along the first dimension
y: float[20] --> coordinates along the second dimension
Changes:
For reference, I took @rayosborn's description of axes https://github.com/nexusformat/definitions/pull/1213#issuecomment-1597849364
Axes:
The AXISNAME fields contain the coordinates associated with the data values. AXISNAME fields are typically one-dimensional arrays, which annotate one of the dimensions. However, the fields can also have a rank greater than 1, in which case the rank of each AXISNAME must be equal to the number of dimensions it spans.
The names of the AXISNAME fields to be used as axis coordinates are listed in the axes attribute. When all the AXISNAME fields are one-dimensional, their positions in the axes attribute correspond to the data dimension being annotated. If one of the data dimensions has no AXISNAME field, the string “.” can be used in the corresponding index of the axes list.
When the rank of any of the AXISNAME fields is greater than 1, the AXISNAME_indices attributes is used to define the data dimension(s) spanned by the corresponding AXISNAME fields. It is strongly discouraged to define the AXISNAME_indices attribute for some AXISNAME fields but not for others."
and merged it with my original wording to obtain this
Axes:
The AXISNAME fields contain the axis coordinates associated with the data values. The names of all AXISNAME fields are listed in the axes attribute.
Rank
AXISNAME fields are typically one-dimensional arrays, which annotate one of the dimensions. However, the fields can also have a rank greater than 1, in which case the rank of each AXISNAME must be equal to the number of data dimensions it spans.
Dimensions
The data dimensions annotated by an AXISNAME field are defined by the AXISNAME_indices attribute. When this attribute is missing, the position(s) of the AXISNAME string in the axes attribute are used.
When all AXISNAME fields are one-dimensional, and none of the data dimensions have more than one axis, the AXISNAME_indices attributes are often omitted. If one of the data dimensions has no AXISNAME field, the string “.” can be used in the corresponding index of the axes list.
When providing AXISNAME_indices attributes it is recommended to do it for all axes.
I mainly had an issue with this
When all the AXISNAME fields are one-dimensional, their positions in the axes attribute correspond to the data dimension being annotated. ... When the rank of any of the AXISNAME fields is greater than 1, the AXISNAME_indices attributes is used to define the data dimension(s) spanned by the corresponding AXISNAME fields.
This seems to imply the AXISNAME_indices are to be ignored when you only have rank 1 axes. This is logically very hard to implement mostly for HDF5 readers. In the merged version this becomes
The data dimensions annotated by an AXISNAME field are defined by the AXISNAME_indices attribute. When this attribute is missing, the position(s) of the AXISNAME string in the axes attribute are used.
This is logically much simpler. @rayosborn's description in spirit wants to say that most of our NXdata groups do not has AXISNAME_indices. So I added this paragraph
When all AXISNAME fields are one-dimensional, and none of the data dimensions have more than one axis, the AXISNAME_indices attributes are often omitted. If one of the data dimensions has no AXISNAME field, the string “.” can be used in the corresponding index of the axes list.
To accommodate @phyy-nx who prefer positive wording I replaced
It is strongly discouraged to define the AXISNAME_indices attribute for some AXISNAME fields but not for others.
with
When providing AXISNAME_indices attributes it is recommended to do it for all axes.
Split the Axes section into "simple" and "advanced" use cases
I did the splitting in another way with subsections rank and dimensions. The axes section should be easier to read now.
Code review:
Applied the requested modifications from https://github.com/nexusformat/definitions/pull/1213#issuecomment-1600934279: examples are merged with the doc sections and adapted to match those sections.
From telco, let's get the same subcommittee mentioned above to re-review and approve the PR (@phyy-nx @woutdenolf @rayosborn @sanbrock). Further reviews are welcome. Target is November 2nd. Thanks!
Ok! Based on Telcos and these approvals, @woutdenolf feel free to merge at your convenience. Thanks!
Thanks for the review everyone!
Closes #1212
In this PR we refactor the NXdata definition to resolve inconsistencies in definition of
@axes
,AXISNAME_indices
andAXISNAME
. NIAC discussions:The proposed changes do not change anything to the current NXdata, they just clear up ambiguity:
@AXISNAME_indices
: data dimension(s) spanned byAXISNAME
. When missing it defaults to the index or indices ofAXISNAME
in@axes
.@AXISNAME_indices
: a single integer or an array of integers.AXISNAME
must be equal to the number of dimensions it spansdimension scales
is a confusing term which comes from HDF5 terminology: useaxes
,axis coordinates
orindependent variables
depending on the contextdescription
description
in sub-sections: signals, axes, uncertainties and examplesProper code examples with be added in a separate merge request: https://github.com/nexusformat/definitions/issues/1253
A python reader would have to do something like this to figure out what all the axes are and to which data dimensions they belong
Rendering of the PR
https://hdf5.gitlab-pages.esrf.fr/nexus/nxdata_axes/classes/base_classes/NXdata.html