nexusformat / definitions

Definitions of the NeXus Standard File Structure and Contents
https://manual.nexusformat.org/
Other
26 stars 57 forks source link

Math support in NeXus #711

Open phyy-nx opened 4 years ago

phyy-nx commented 4 years ago

Hi folks, I know this opens up a topic previously closed (see #376), but I think it's worth revisiting. We have two detectors, the Jungfrau 16M at SwissFEL and the AGIPD detector at EuXFEL, that both have complex calibration procedures. These detectors have very high data rates and it is costly both in run-time at the beamline and in storage space to apply the calibrations in memory and then write out the data to disk (for example, the calibration converts from int16 to float64). It would be better to instead have the master file link to the raw data and the gain and pedestal maps, and then specify how the corrections are applied by an equation.

A simplified version of the equation is

calibrated_data = (raw_data-pedestal)/gain

(this is complicated by the bitwise operators needed to select between the 3 gains)

So, as part of the code camp 2020, we're opening this issue back up. How could we build NeXus files that include raw data and the instructions on how to apply the calibrations? What is the language that the equation could be written in? Would we need to look into parsers ourselves, if so what are available?

(passes ball to @benajamin)

FilipeMaia commented 4 years ago

I think this is a very interesting subject, but one should be careful not to make things too complex. I wonder if the same result could be achieved with a smart HDF5 filter, instead of handling it explicitly in NeXus?

benajamin commented 4 years ago

Yes, this issue has a strong use-case, but is fraught with problems. Certainly, executable code in data files could be a security nightmare, especially if it isn't heavily restricted. My previous research found a few maths parsers that could be useful since they both limit what the "embedded code" can do and promise good performance in crunching the numbers. Here are some maths parsers I found:

Name Implemented License Bindings/translations
MuParser C++ MIT python
MuParserX C++ BSD
mXparser JAVA, C#.NET Simplified BSD   
ExprTk C++ MIT python
TinyExpr ANSI C zlib    
JS Expr-Eval Javascript MIT python

The next step would be to try actually making use of one (or more) of these parsers to make sure it is a viable solution to your problem.

benajamin commented 4 years ago

For a NeXus solution, we would need the following ingredients:

We could use a null-dataset for the location to store the result and provide the equation string and set of variable-pointers as attributes to this null-dataset. A successful evaluation would then replace the null-dataset with the resulting array of values. For example a file could contain:

/entry/instrument/detector/raw_data = array /entry/instrument/detector/offset = scalar or array /entry/instrument/detector/gain = scalar or array /entry/instrument/detector/data = null /entry/instrument/detector/data@evaluate = "(A-B)/C" /entry/instrument/detector/data@var_A = "/entry/instrument/detector/raw_data" /entry/instrument/detector/data@var_B = "/entry/instrument/detector/offset" /entry/instrument/detector/data@var_C = "/entry/instrument/detector/gain"

PeterC-DLS commented 4 years ago

There was mention of using TeX. A quick search brings up:

PeterC-DLS commented 3 years ago

There's a new game in town: https://github.com/lucasvr/hdf5-udf

phyy-nx commented 3 years ago

From NIAC 2020:

@phyy-nx will investigate.

phyy-nx commented 9 months ago

From Telco Dec 20 2023, we would like to form a subcommittee to resolve issues related to math support in NeXus. Current members @domna @benajamin @PeterC-DLS @yayahjb. Please distribute your discussions! We'd like to resolve this by NIAC 2024 (Sept 2024).

FilipeMaia commented 9 months ago

FYI here's an example of what could be done with an HDF5 filter, https://github.com/FilipeMaia/h5calib

phyy-nx commented 9 months ago

@FilipeMaia a filter like this could be (or maybe is) a conda package, right?

FilipeMaia commented 9 months ago

That repository was just a test to see if it could be done. But it could certainly be more carefully developed and turned into a conda package.

phyy-nx commented 9 months ago

👍

woutdenolf commented 3 days ago

My opinion on the math/formula discussion: do not introduce string based types/fields meant for anything else than documentation. Meaning it is not intended to be evaluated, parsed, validated or dereferenced by a machine. Also validators and parsers can be exploited by a carefully crafted NX_CHAR, not just evaluators.

phyy-nx commented 3 days ago

Proposal, three options:

Option 1) Users may express formulas in NX_CHAR. It is assumed they will not be evaluated.

Option 2) NeXus will support math using the following steps:

  1. Choose a grammar. Requested features include standard arithmetic, log, exponentiation, and bit shift
  2. Add new type, NX_MATH, a formula string
  3. Define how symbols are used, which are fields in the same group as the NX_MATH field. Use links as needed.
  4. In the NX_MATH documentation provide lots of warnings, including that software should not directly evaluate NX_MATH fields without checking the expression matches the grammar exactly
  5. Provide a list of software packages that implement at least checking the grammar and maybe executing it. Include the major programming languages (Python, C++, R, Javascript)

Option 3) NeXus shall not be code. It is permissible to describe calibration procedures, but not to include formulas.

Option 4) Punt decision for more research and debate. FairMAT will use formula descriptions instead of formulas.

Option 4 is adopted by vote in NIAC 2024

(leaving issue open for now)