NXcxi_ptycho: how flexible is this definition?

prjemian commented 4 years ago

A group of US light sources is evaluating NeXus as a storage and exchange container to transfer raw and processed to and from the Bluesky framework. We are focused on the NXcxi_ptychography application definition as either an example or a basis for use. We have questions that can influence should we suggest modifications to NXcxi_ptycho or design a more general NXptycho for raw data.

Also interested in using an a standard microscopy application definition for reduced data (if that exists). The NXstxm looks attractive but may not be flexible for these purposes.

The questions:

NXcxi-ptycho is for raw data, right?
This application definition appears to be based on raster scan - is this flexible? Spiral scan, for instance?
Instead of raster, could provide a list of coordinates (and description of the list's components)
What about reduced ptycho data? Any microscopy application definitions? NXstxm too specific?
What about fixed detector and sample motions?
Can we change NX_FLOAT (such as x_pixel_size) with NX_NUMBER? Might provide an integer instead of a float.

prjemian commented 4 years ago

Bluesky: https://blueskyproject.io/

FilipeMaia commented 4 years ago

NXcxi-ptycho is for raw data, right?

This application definition appears to be based on raster scan - is this flexible? Spiral scan, for instance?

Instead of raster, could provide a list of coordinates (and description of the list's components)

It's for raw data. I agree the explicit npts_x and npts_y is a bit clumsy, as it's an optimization for raster scans which are very unusual as such a collection geometry gives rise to difficulties in the reconstruction. But you can set one of them to 1 and the other one equal to the number of points you have. That way you can specify all the positions with no geometry assumptions.

Cheers, Filipe

prjemian commented 4 years ago

@FilipeMaia : Thanks!

benajamin commented 4 years ago

We have actually used NXstxm for some ptycho measurements. It was written with arbitrary scan patterns and dtector types in mind. I'm not sure what you mean by:

What about fixed detector and sample motions?

prjemian commented 4 years ago

That refers to a raster scan of the sample while the detector position remains fixed. Assume a raster scan of image frames, each at a different sample position (and the same detector position). How is this handled?

JulReinhardt commented 4 years ago

It's for raw data. @FilipeMaia would it be in scope to extend to results data?

FilipeMaia commented 4 years ago

Sure I absolutely think it makes sense to extend it to results. I should point out that, while NXcxi_ptycho draws inspiration from CXI, @aaron-parsons is the person behind it, not me.

prjemian commented 4 years ago

Let's discuss in the next (Sept 4) NeXus telco if we can generalize this application definition.

vasole commented 4 years ago

One thing we really appreciate of the NXtomo definition is that it does not assume any regularity on the acquisition. That approach should be extended whenever possible. Regular grids should be considered a particular case. It can even be detected without being flagged as a regular grid.

Instead of the particular, regular grid case (that in real life it might turn out to be not so regular as expected!), it is much simpler to understand and to deal with to have things based on the acquisition itself like shown below without having to split npoints in the dimensions of the regular grid. Furthermore, virtual datasets allow to provide access to those data under different layout if needed.

[actual_file_path_excluded]whatever_positioner1[npoints]
[actual_file_path_excluded]whatever_positioner2[npoints]
[actual_file_path_excluded]data[npoints, nchannels]@interpretation="spectrum"
[actual_file_path_excluded]data[npoints, nrows, ncolumns]@interpretation="image"

benajamin commented 4 years ago

That refers to a raster scan of the sample while the detector position remains fixed. Assume a raster scan of image frames, each at a different sample position (and the same detector position). How is this handled? This is exactly the expected situation in NXstxm. Is there an alternative situation possible for ptychography? Scanning the detector doesn't make sense.

benajamin commented 4 years ago

That refers to a raster scan of the sample while the detector position remains fixed. Assume a raster scan of image frames, each at a different sample position (and the same detector position). How is this handled? This is exactly the expected situation in NXstxm. Is there an alternative situation possible for ptychography? Scanning the detector doesn't make sense.

prjemian commented 4 years ago

We'll direct our attention to examine NXstxm for possible use or derive a new application definition from that if needed.

JulReinhardt commented 4 years ago

@benajamin

In the NXstxm definition: https://manual.nexusformat.org/classes/applications/NXstxm.html it says for the data entry: "Detectors that provide more than one value per scan point should be summarised to a single value per scan point for this array in order to simplify plotting." But you can still use 2D frames here and add multiple dataset, if needed by having entries like data_1, data_2 ...? What about all these 2D frames related information like beam center on detector, frame size, distance, etc that is covered in NXcxi_ptycho? I like the "scanning mode" entry, how well do people stick with the defined names?

vincefn commented 4 years ago

NXcxi-ptycho is for raw data, right? 2. This application definition appears to be based on raster scan - is this flexible? Spiral scan, for instance? 3. Instead of raster, could provide a list of coordinates (and description of the list's components) It's for raw data. I agree the explicit npts_x and npts_y is a bit clumsy, as it's an optimization for raster scans which are very unusual as such a collection geometry gives rise to difficulties in the reconstruction. But you can set one of them to 1 and the other one equal to the number of points you have. That way you can specify all the positions with no geometry assumptions.

I'm still not convinced about this approach: would it not be better to have the format rely on the simplest possible scan definition ? Any information about the actual geometry used -which is only useful for specific (software or beamline-dependent) implementations- could be stored in extra fields, including virtual datasets, etc...

While I globally like this CXI implementation, designing it for a special case instead of a general one still does not seem right.

Alternatively if raster-based ptycho scans need to be standardised, why not have two closely-related formats, NXcxi-ptycho and NXcxi-ptycho_grid ?

JulReinhardt commented 4 years ago

I'm still not convinced about this approach: would it not be better to have the format rely on the simplest possible scan definition ? Any information about the actual geometry used -which is only useful for specific (software or beamline-dependent) implementations- could be stored in extra fields, including virtual datasets, etc...

Yes, all required fields should be kept to the minimum actually required for the basic reconstruction and optional fields allow for more detailed input.

While I globally like this CXI implementation, designing it for a special case instead of a general one still does not seem right.

The idea is to evaluate if the current cxi, NXcxi_ptycho and NXstxm definition are general enough so that most beamlines could adapt these.

Alternatively if raster-based ptycho scans need to be standardised, why not have two closely-related formats, NXcxi-ptycho and NXcxi-ptycho_grid ?

Do you mean 2 different NX definitions or different fields for these?

vasole commented 4 years ago

That refers to a raster scan of the sample while the detector position remains fixed. Assume a raster scan of image frames, each at a different sample position (and the same detector position). How is this handled? This is exactly the expected situation in NXstxm. Is there an alternative situation possible for ptychography? Scanning the detector doesn't make sense.

For everything I expect either a single value or as many values as acquisition points. That you move the detector or not or the sample or not it is straightforward to interpret if you know the motors associated to each of them.

Going for definitions implying particularities is a mistake.

vincefn commented 4 years ago

While I globally like this CXI implementation, designing it for a special case instead of a general one still does not seem right.

The idea is to evaluate if the current cxi, NXcxi_ptycho and NXstxm definition are general enough so that most beamlines could adapt these.

I understand. I'm more commenting on the choices made in the format.

Alternatively if raster-based ptycho scans need to be standardised, why not have two closely-related formats, NXcxi-ptycho and NXcxi-ptycho_grid ?

Do you mean 2 different NX definitions or different fields for these?

Personally I'd push for a single format, without any grid assumption, and add any 'extra' information required for site-specific processing (e.g. that it is on a raster grid for combination with fluo, or that it is part of a ptycho-tomo or multi-energy (spectro-ptycho) dataset) in private (not standardised) fields.

But if there is a need (and by 'need' I mean that enough facilities will use the format) for a grid scan then you could define two NX definitions, or allow (if that's legal) different options in one format..

FilipeMaia commented 4 years ago

I'm also of the opinion that the extra logic to support regular grids is undesirable as its use case in ptychography is almost non-existent and the storage gains are very marginal.

JulReinhardt commented 4 years ago

I think that providing the positions without explicit grid assumption is a good general approach. Position values for each frame should be sufficient for any reconstruction algorithm.

I went through cxi, NXcxi_ptycho and NXstxm in comparison and some questions/thoughts came to my mind: 1) Cxi format contains information such as detector distance or pixel size (or corner position) in either instrument (raw) or image (processed) → What about a combination of raw and derived data? 2) NXcxi_ptycho does not contain fields for derived data (image) 3) Pixel size of reconstructed image is not included in either. 4) What hierarchy makes most sense? It differs slightly between cxi and NXcxi even. 5) Cxi contains energy in instrument/source → beam or machine?

Any comments or clarifications if I got something wrong would be appreciated. Thanks.

vincefn commented 4 years ago

Cxi format contains information such as detector distance or pixel size (or corner position) in either instrument (raw) or image (processed) → What about a combination of raw and derived data?

NXcxi_ptycho does not contain fields for derived data (image)

Pixel size of reconstructed image is not included in either.

Niether ptycho CXI example and NXcxi_ptycho have any information about the derived image. I guess it's less important in terms of standardisation than raw data (which has to be archived), though it could be useful.

There are however examples (for CDI) of output in the CXI definition, and you can build from that. For PyNX I followed those examples but with more fields (/entry1/result*/data) to save object, probe, positions, incoherent background, floating intensities, and all the parameters used for the optimisation, so it's quite customised. See example files in http://ftp.esrf.fr/pub/scisoft/PyNX/data/ - which I don't claim to be standard, there is a lot of extra information saved mostly for the purpose of book-keeping.

What hierarchy makes most sense? It differs slightly between cxi and NXcxi even.

I thought the hierarchy was mostly identical, with a few more fields for NXcxi

Cxi contains energy in instrument/source → beam or machine?

In the original CXI definition, it was the X-ray beam's. That conflicts with the NX definition where it is supposed to be the machine's energy. See my initial comment in #647 . Not much we can do about this for old files except check if the X-ray beam energy is available in the correct place, and if not check the order of magnitude for the energy in instrument/source.

JulReinhardt commented 4 years ago

Niether ptycho CXI example and NXcxi_ptycho have any information about the derived image. I guess it's less important in terms of standardisation than raw data (which has to be archived), though it could be useful.

Yes, in the cxi documentation the cxi example for ptycho experiment does not contain derived data, however the example for phased 3D image does. So using image_* as the entry for results seems like an obvious solution.

There are however examples (for CDI) of output in the CXI definition, and you can build from that. For PyNX I followed those examples but with more fields (/entry1/result*/data) to save object, probe, positions, incoherent background, floating intensities, and all the parameters used for the optimisation, so it's quite customised.

Nice example, I like using result_* . It could be a good starter for discussion, what naming convention makes most sense.

I thought the hierarchy was mostly identical, with a few more fields for NXcxi.

When I look at the hierarchies for cxi and NXcxi to me it looks like in the original cxi definition everything sits under entry* , whereas for the NXcxi DATA and data exist on the top-level with links for DATA, data_ to the entries under instrument/detector/data (which also exist in cxi) and sample* is under the entry* level in original cxi and on the top level in NXcxi. Why does NeXus need these extra links to top level entries? @prjemian ?

In the original CXI definition, it was the X-ray beam's. That conflicts with the NX definition where it is supposed to be the machine's energy. See my initial comment in #647 . Not much we can do about this for old files except check if the X-ray beam energy is available in the correct place, and if not check the order of magnitude for the energy in instrument/source.

As of today it looks like NXcxi has both source/energy = energy of the machine and beam/energy energy of the beam. So that should be general enough to serve ptycho (machine energy not required) and CDI (machine energy for XFEL). Objections?

prjemian commented 4 years ago

@JulReinhardt : The links to data are a matter of style and not requirement.

A definition might require certain information to be available at a specific location. From a bluesky run, the tool I use writes the run's data and metadata into directories under /entry/instrument/ since that is raw data from the instrument.

Links are used to avoid writing the same information multiple times.

JulReinhardt commented 4 years ago

Niether ptycho CXI example and NXcxi_ptycho have any information about the derived image. I guess it's less important in terms of standardisation than raw data (which has to be archived), though it could be useful.

Another thought on derived and raw data. I think standardization of derived data could have a great impact on post-reconstruction analysis, which many users mainly care about.

FilipeMaia commented 4 years ago

Just a couple of comments. Using source/energy in CXI for photon energy was an unfortunate mistake of mine, by misinterpreting the documentation. I'm trying to nudge people to use the Beam class for new files. Also does the incident_energy in NXbeam refer to the energy of each photon?

I agree with @JulReinhardt that the reconstructed data is what users will care about. On CXI the idea was to have it in image_* but I can see how using result could also make sense.

prjemian commented 4 years ago

In NXbeam,

variables such as the incident energy could be scalar values or arrays. ... in which it is useful to specify the beam profile...

In this case, the array is per photon? Or beam profile? Seems to fit the documentation.

vincefn commented 4 years ago

Just a couple of comments. Using source/energy in CXI for photon energy was an unfortunate mistake of mine, by misinterpreting the documentation. I'm trying to nudge people to use the Beam class for new files.

The CXI examples still use source/energy.

Also does the incident_energy in NXbeam refer to the energy of each photon?

CXI definition says Energy of the incident beam and Nexus says NXBeam Properties of the neutron or X-ray beam at a given location and incident_energy Energy on entering beamline component, so it refers to the average photon energy in the beam.

I agree with @JulReinhardt that the reconstructed data is what users will care about.

Yes, raw data may be more fundamental as it's supposed to be what's archived but for techniques like ptychography where the analysis is supposed to be robust it's likely the users will only care about the final image.

If we want to go to a reconstructed data format, should this be for 2D projections or also allow a full ptycho-tomo dataset ?

On CXI the idea was to have it in image_* but I can see how using result could also make sense.

True, I used result_* for all results in pynx, but image_* should have been used instead...

FilipeMaia commented 4 years ago

The CXI examples still use source/energy.

Very good point! I'll start on updating the format documentation and mark source/energy as deprecated. If there are other changes in the format that you think makes sense please let me know.

Also does the incident_energy in NXbeam refer to the energy of each photon?

CXI definition says Energy of the incident beam and Nexus says NXBeam Properties of the neutron or X-ray beam at a given location and incident_energy Energy on entering beamline component, so it refers to the average photon energy in the beam.

I think the documentation could be clearer. For XFELs, beam energy can be easily misunderstood as the total energy of one pulse.

phyy-nx commented 4 years ago

I'm only superficially following this thread, but NXmx adds a lot to the incident_wavelength documentation. Maybe relevant?

FilipeMaia commented 4 years ago

@phyy-nx thanks that's interesting. The wavelength shot-by-shot is something I already use in CXI. Taking the shot-by-shot spectrum into account is also nice! What I'm still missing though is someplace to record the total energy per FEL pulse, as given by a gas monitor detector. I see you have total_flux, but that depends on time, which is both hard to measure and not very well defined.

benajamin commented 4 years ago

What I'm still missing though is someplace to record the total energy per FEL pulse, as given by a gas monitor detector.

@FilipeMaia You should consider NXmonitor for this purpose. In my NXstxm files, we record the synchrotron beam current in an NXmonitor as a proxy for time-variations in photon flux.

FilipeMaia commented 4 years ago

@benajamin Is there any advantage of adding it to NXmonitor instead of extending NXBeam as NXmx does, which seems simpler?

benajamin commented 4 years ago

I suppose it is a question of how well the measurement reflects the physical quantity you need for the analysis. Presenting the flux measurement in NXbeam is telling the data consumer that it is the quantity, while presenting it as NXmonitor merely indicates that the data consumer can use it as a normalisation signal without the promises of being accurate and well calibrated, etc.

FilipeMaia commented 4 years ago

@benajamin thanks, that's very helpful! Then I'll go ahead and extend NXbeam with pulse_energy.

prjemian commented 3 years ago

What changes are proposed for NXcxi_ptycho at this time? There is no branch or PR. Perhaps completion of any changes be pushed to the next milestone so this does not delay the imminent 2020.10 release.

prjemian commented 3 years ago

Moving this discussion issue from the Milestone for release this month.

nexusformat / definitions

NXcxi_ptycho: how flexible is this definition? #792