nexusformat / definitions

Definitions of the NeXus Standard File Structure and Contents
https://manual.nexusformat.org/
Other
26 stars 57 forks source link

How to relate NXData to NXDetector and each other #583

Open jacobfilik opened 7 years ago

jacobfilik commented 7 years ago

Short version:

From discussion in #580, what is the recommended way to show the data in NXData comes from a certain NXDetector (and that an axis in the NXData comes from a certain NXPositioner etc), and also how do you show one 'NXData' contains the result of processing another?

Long version:

The standard entry point for NeXus files is the NXdata group. As a developer for visualization and processing software I can walk the @default tags to guide me to the plottable data a user expects to see. For a SAXS beamline, this is likely to be the raw SAXS detector image. Finding the corresponding NXDetector allows me to display extra information about this data (use the transformations to convert pixel to 2 theta for example). For the trivial case of a single detector, I can simply find the single NXdetector in the NXinstrument, but it is now common for beamlines to have multiple detectors, so they only way to relate the NXdata to the NXdetector is by using internal HDF5 ids to determine if the datasets are the same (not ideal, requires searching the tree and puts a dependency on the HDF5 back end), or hoping that the names are similar and compare the strings (worse). The @target annotation appears to almost do what I need (with the added bonus of linking axes to whatever has been scanned i.e. NXPositioner, NXmonochromator..., which is also useful to know), but that is not its purpose, and does not work for externally linked data.

Additionally, many of the NXdata in the NeXus files we write are not the raw detector data, but have been produced by simple processing of the raw data. Having a standard, recommended way of stating that an NXdata results from processing a different NXdata etc all the way back to the NXdata that contains the raw detector image would make it much easier to show the user how the file describes the experiment performed. For example, a fluorescence grid scan might collect raw data from an xspress3 (8 channels, 4k elements) detector, producing an NXdetector and corresponding NXdata. The 8 channels of this detector are summed to a 4k spectrum and written into another NXdata (this is the data the users would expect to see, over the raw data). This 4k spectrum is integrated over n regions (corresponding to n elements being investigated), producing n NXData, the elemental maps that the user is really interested in.

Currently I have no idea how to express in the NeXus structure that the n NXdata of the elemental maps are calculated from the sum NXdata which is calculated from the raw NXdata which comes from the NXdetector, but this seems fundamentally important to describe the experiment this file reflects.

Any suggestions on the correct way to use NeXus to describe these experiments would be greatly appreciated.

phyy-nx commented 7 years ago

Thanks, this issue is well explained here and is highly relevant to what we want to do at LCLS. We also have raw, corrected and processed data.

+1.

-Aaron

On Jun 27, 2017, at 1:17 AM, Jacob Filik notifications@github.com wrote:

Short version:

From discussion in #580, what is the recommended way to show the data in NXData comes from a certain NXDetector (and that an axis in the NXData comes from a certain NXPositioner etc), and also how do you show one 'NXData' contains the result of processing another?

Long version:

The standard entry point for NeXus files is the NXdata group. As a developer for visualization and processing software I can walk the @default tags to guide me to the plottable data a user expects to see. For a SAXS beamline, this is likely to be the raw SAXS detector image. Finding the corresponding NXDetector allows me to display extra information about this data (use the transformations to convert pixel to 2 theta for example). For the trivial case of a single detector, I can simply find the single NXdetector in the NXinstrument, but it is now common for beamlines to have multiple detectors, so they only way to relate the NXdata to the NXdetector is by using internal HDF5 ids to determine if the datasets are the same (not ideal, requires searching the tree and puts a dependency on the HDF5 back end), or hoping that the names are similar and compare the strings (worse). The @target annotation appears to almost do what I need (with the added bonus of linking axes to whatever has been scanned i.e. NXPositioner, NXmonochromator..., which is also useful to know), but that is not its purpose, and does not work for externally linked data.

Additionally, many of the NXdata in the NeXus files we write are not the raw detector data, but have been produced by simple processing of the raw data. Having a standard, recommended way of stating that an NXdata results from processing a different NXdata etc all the way back to the NXdata that contains the raw detector image would make it much easier to show the user how the file describes the experiment performed. For example, a fluorescence grid scan might collect raw data from an xspress3 (8 channels, 4k elements) detector, producing an NXdetector and corresponding NXdata. The 8 channels of this detector are summed to a 4k spectrum and written into another NXdata (this is the data the users would expect to see, over the raw data). This 4k spectrum is integrated over n regions (corresponding to n elements being investigated), producing n NXData, the elemental maps that the user is really interested in.

Currently I have no idea how to express in the NeXus structure that the n NXdata of the elemental maps are calculated from the sum NXdata which is calculated from the raw NXdata which comes from the NXdetector, but this seems fundamentally important to describe the experiment this file reflects.

Any suggestions on the correct way to use NeXus to describe these experiments would be greatly appreciated.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

rayosborn commented 7 years ago

That is a very helpful summary. This requires a lot of thought but I have two initial comments. Firstly, it is not completely accurate that this is not the purpose of the @target attribute. It was always our intention that it be used for exactly this kind of functionality, but as you have discovered, it doesn't work for this purpose with externally linked data. NeXus was designed before HDF5, so this limitation has arisen because of the way HDF5 implements external links, not because of the design. Nevertheless, it is a limitation that we have to overcome somehow.

Second, the NXprocess group is designed to allow a workflow to be documented. For example, it has a sequence number and time stamp, along with the program name and version number. It also has an NXnote group that allows for a description of the operation.

Here is an example of how we have implemented it (the multiline strings are truncated):

nxfind:NXprocess
  date = '2015-05-11T16:33:10.772632'
  note:NXnote
    data = 'Current machine: nxrs.msd.anl.gov
                Current working direct...'
    date = '2015-05-11T16:33:10.772245'
    description = 'nxfind -f sm2ru3ge5/db0073b-1/sm2ru3ge5_160K.nxs -p f1/d...'
  program = 'nxfind'
  sequence_index = 2
  version = '0.1.0'

It doesn't specifically have fields to specify the input and output groups. That could be put in the description, but it might be worth thinking of adding these. I would suggest that part of the discussion should be in how the NXprocess group could better meet your needs.

prjemian commented 7 years ago

@mkoennecke : will add documentation of @napimount attribute to the manual, see #584

prjemian commented 7 years ago

I, too, agree with the intent of this issue. This issue makes more than one request, though. [1] to associate items in NXdata with their raw data locations, and [2] correlate a sequence of NXdata (or other) groups. The @target attribute addresses [1], via the NeXus link. The counterpoart, I just now learned, is the @napimount attribute for external file links.

It is [2] that will need some work.

rayosborn commented 7 years ago

I don't use the C API, so I might be mistaken, but I think the @napimount is useless in this discussion, for reasons I discuss in #580. I believe it is used by the API to create the external link, but once it has been created, it is not stored in the HDF5 file, because the only attributes associated with the linked field or group are those in the external file itself.

mkoennecke commented 7 years ago

I checked the code. You are partially right, napimount is not always written. It is not in the hdf5 case where links are native. But then the same information as provided by napimount can be queried form the file directly. Both the same of the external file and the path into it. See NXisexternal() in napi.c and napi5.c

jacobfilik commented 7 years ago

So I have opened a pull request into the example data repository containing a grid scan using a simulated beamline and areaDetector.

A cut down dump of the tree is shown here: (unfortunately nexpy failed to print this tree as h5py failed on some of the string datasets...)

 entry:NXentry
    instrument:NXinstrument
      mic:NXdetector
        count_time = 1.0
        data -> p45-1168-mic.hdf5['/entry/instrument/detector/data']
        total -> p45-1168-mic.hdf5['/entry/instrument/NDAttributes/StatsTotal']
      stagex:NXpositioner
        name = 'stagex'
        value = float64(5x5)
          @target = '/entry/instrument/stagex/value'
        value_set = float64(5)
          @target = '/entry/instrument/stagex/value_set'
      stagey:NXpositioner
        name = 'stagey'
        value = float64(5x5)
          @target = '/entry/instrument/stagey/value'
        value_set = float64(5)
          @target = '/entry/instrument/stagey/value_set'
    mic:NXdata
      @axes = ['stagey_value_set' 'stagex_value_set' '.' '.']
      @signal = 'data'
      @stagex_value_indices = [0 1]
      @stagex_value_set_indices = 1
      @stagey_value_indices = [0 1]
      @stagey_value_set_indices = 0
      data -> p45-1168-mic.hdf5['/entry/instrument/detector/data']
      stagex_value -> /entry/instrument/stagex/value
      stagex_value_set -> /entry/instrument/stagex/value_set
      stagey_value -> /entry/instrument/stagey/value
      stagey_value_set -> /entry/instrument/stagey/value_set
    mic_total:NXdata
      @axes = ['stagey_value_set' 'stagex_value_set' '.' '.']
      @signal = 'total'
      @stagex_value_indices = [0 1]
      @stagex_value_set_indices = 1
      @stagey_value_indices = [0 1]
      @stagey_value_set_indices = 0
      stagex_value -> /entry/instrument/stagex/value
      stagex_value_set -> /entry/instrument/stagex/value_set
      stagey_value -> /entry/instrument/stagey/value
      stagey_value_set -> /entry/instrument/stagey/value_set
      total -> p45-1168-mic.hdf5['/entry/instrument/NDAttributes/StatsTotal']

So the instrument this scan has 1 detector "mic" and 2 positioners "stagex" and "stagey".

This scan produces 2 NXdata groups, one for mic and one for the sum value for mic.

The question really is, given this tree, how can I write code to recognise that NXdata mic_total comes from processing NXdata mic and the NXdata mic comes from NXdetector mic.

(Please don't say they are all called mic so this is easy, or that there is only 1 detector so this is easy)

I would really like to be able to say there is a NeXus standard way to define this information (being one of the people that has to explain to beamlines what they get from NeXus/HDF5 compared to text file and stack of tiffs).

As an aside, in doing this I noticed we do write @target on the NXpositioner values so we don't need to write both the stage value and value_set datasets as axes in the NXdata, we could just write value_set and follow the @target to get the read back values (and much more information). Hopefully this shows how something more generalised like @DATASETNAME_target (as for axes) would be useful (sometimes our positions are written in external files too).

prjemian commented 6 years ago

Needs proposal of what to do to resolve this.

zjttoefs commented 5 years ago

I'm having trouble with the issue UI here...

vasole commented 4 years ago

Dear colleagues, Is this to be closed or not?

@markbasham and I agreeded on the DATASET_info approach.

However, while it can be used, at the ESRF we have ended up by making sure the target is where the associated, richer information is expected. To make it clear, the associated information about the data provided by a detector is to be expected at the NXdetector level, about a motor at its NXpositioner level and so on. Therefore following the HDF5 link is, in most cases enough.

The DATASET_info approach can still be used as it has advantages:

prjemian commented 4 years ago

@vasole: So your proposition to resolve this is:

  1. define the @target attribute (used by a NeXus link) to the HDF5 address where the associated, richer information is expected
  2. reserve the _info suffix in NeXus so it may be applied to DATASET_info to describe the relationship(s) of this data to its sources

Is that a correct summary?

What is the structure of DATASET_info? Dataset? NXnote group?

See the list of Reserved field name suffixes in the NeXus documentation.

vasole commented 4 years ago

The rationale of the proposal, and the reason of this discussion is that the @target attribute cannot be used when dealing with external links.

The proposition is to use the suffix _info to point to a Group. In the case of the issue opener, the DATASET used as signal for the plot would be accompanied by a DATASET_info pointing to the NXdetector group containing it and the AXIS used as axis could also be accompanied by a AXIS_info pointing to the NXpositioner.

The use of the word _info instead of _target is because the former is more generic and more meaningful for somebody browsing a NeXus file using a graphical interface.

jacobfilik commented 4 years ago

So assuming that @DATASET_info just contains the same information as the @target but is on the group rather than the dataset, I am happy with this solution.

I would probably suggest using the same “strict writer, liberal reader” wording in the documentation for these features as used for the @AXISNAME_indices.

Happy to close but what is the next step? Should I/We add this to the documentation as a pull request as a first step towards it being ratified?

prjemian commented 4 years ago

The other way around. If this needs attention of the NIAC, it should not go into the documentation yet. However, it is good to prepare this in a branch, with a pull request. This would benefit from a brief example. Once testified by the NIAC, the PR can be merged.

On Tue, Feb 4, 2020, 3:43 AM Jacob Filik notifications@github.com wrote:

So assuming that @DATASET_info just contains the same information as the @target but is on the group rather than the dataset, I am happy with this solution.

I would probably suggest using the same “strict writer, liberal reader” wording in the documentation for these features as used for the @AXISNAME_indices.

Happy to close but what is the next step? Should I/We add this to the documentation as a pull request as a first step towards it being ratified?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/nexusformat/definitions/issues/583?email_source=notifications&email_token=AARMUMHIDFPCC7JJGHYNVE3RBE2FJA5CNFSM4DQZFFFKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKW65RY#issuecomment-581824199, or unsubscribe https://github.com/notifications/unsubscribe-auth/AARMUMF3WTMGTRZJLDGTUKLRBE2FJANCNFSM4DQZFFFA .