prjemian / punx

Python Utilities for NeXus HDF5 files
https://prjemian.github.io/punx
5 stars 7 forks source link

Is having non-NeXus content "not generally acceptable"? #239

Open biochem-fan opened 10 months ago

biochem-fan commented 10 months ago

I generate my NeXus file by adding metadata to *_master.h5 written by a Dectris EIGER detector. Thus, the file contains many Dectris specific items that are not defined in NeXus.

punx flags them as "WARN: NXcollection contains non-NeXus content" and at the bottom of the output WARN is explained as "does not meet NeXus specification, not generally acceptable".

I wonder if having non-NeXus content is really "not generally acceptable".

CC: @phyy-nx

prjemian commented 10 months ago

It comes down to the meaning of generally acceptable. While this Dectris-specific content might be known to custom data file readers, it is not guaranteed that a general reader might understand what to do with the content. The assignment of warning to any validation finding has been given a lower numerical value. Only those findings that are marked as error result in a strong negative value. The overall finding value for the data file gives a very coarse opinion of how general this data file is.

In the Naming Conventions section of the NeXus User Guide, there is a table of Reserved prefixes. Are you using the DECTRIS_ prefix as described there?

As noted in #238, the reserved prefixes feature is evaluated based on the version of the NeXus definitions in use. If I recall, the reserved prefixes were adopted after the v2018.5 release. You can add the latest definitions version (v2022.07) with the punx install command:

punx install v2020.07

By default, punx will validate using the latest version (by release date) installed locally. If you are using the DECTRIS_ prefix, validation with a more recent version may change the findings of this Dectris-specific content.

prjemian commented 10 months ago

To find what definitions versions you have available, run: punx config

biochem-fan commented 10 months ago

Are you using the DECTRIS_ prefix as described there?

No.

You can add the latest definitions version (v2022.07) with the punx install command:

Yes, I am using the latest definitions:

# main          user   2023-06-26 08:57:16 d669ffb /home1/XXXX/.config/punx/main

It comes down to the meaning of generally acceptable. While this Dectris-specific content might be known to custom data file readers, it is not guaranteed that a general reader might understand what to do with the content.

My question is, is it acceptable to have items not defined in NeXus? Of course I understand that "a general reader might understand what to do with the content". I don't expect/require general readers use those additional items. They are stored as supplemental information, just for records.

If NeXus standard requires that a NeXus file must not contain additional non-NeXus data, I will make two files, one with NeXus fields only and the other for additional data. But is such separation really necessary? I am new to NeXus and haven't went through all formal specifications, so any suggestions would be welcome.

prjemian commented 10 months ago

My question is, is it acceptable to have items not defined in NeXus?

Yes, it is acceptable. Too bad this is not in the list frequently asked questions. I'll fix that soon. The easiest reference to find in the NeXus User Guide is in the NeXus Class Definitions section. These paragraphs under Base classes:

Base class definitions are permissive rather than restrictive. While the terms defined aim to cover most possible use cases, and to codify the spelling and meaning of such terms, the class specifications cannot list all acceptable groups and fields. To be able to progress the NeXus standard, additional data (groups, fields, attributes) are acceptable in NeXus HDF5 data files.

Users are encouraged to find the best defined location in which to place their information. It is understood there is not a predefined place for all possible data.

Validation procedures should treat such additional items (not covered by a base class specification) as notes or warnings rather than errors.

The punx code reports these findings as warnings.

prjemian commented 10 months ago

Would you agree that the wording here seems a bit strong and could be relaxed a bit? Instead of "not generally acceptable", it could report "not generally recognized".

prjemian commented 10 months ago

Actually, the finding of NOTE or WARN may be very specific to the details of the finding. The first table here lists the different possible findings. Softening the wording of the WARN finding may not be the best solution for this case.

Can you report here some of the WARN findings? Here's my example using the same version of the NeXus definitions and the S2p5min_00070_00001.h5 example file:

(bluesky_2023_3) prjemian@arf:~/.../BCDA-APS/gemviz$ punx config

!!! WARNING: this program is not ready for distribution.

Locally-available versions of NeXus definitions (NXDL files)
============= ====== =================== ======= ==================================================================
NXDL file set cache  date & time         commit  path                                                              
============= ====== =================== ======= ==================================================================
a4fd52d       source 2016-11-19 01:07:45 a4fd52d /home/prjemian/Documents/projects/prjemian/punx/punx/cache/a4fd52d
v3.3          source 2017-07-12 10:41:12 9285af9 /home/prjemian/Documents/projects/prjemian/punx/punx/cache/v3.3   
Schema-3.4    user   2018-05-15 08:24:34 aa1ccd1 /home/prjemian/.config/punx/Schema-3.4                            
v2018.5       source 2018-05-15 16:34:19 a3045fd /home/prjemian/Documents/projects/prjemian/punx/punx/cache/v2018.5
v2020.1       user   2020-01-31 04:17:34 5c4cfec /home/prjemian/.config/punx/v2020.1                               
v2022.07      user   2022-08-02 06:43:48 e5e2347 /home/prjemian/.config/punx/v2022.07                              
main          user   2023-06-26 08:57:16 d669ffb /home/prjemian/.config/punx/main                                  
============= ====== =================== ======= ==================================================================

default NXDL file set:  main
$ punx validate --report WARN ../tiled-template/dev_sampler/nexus_punx/S2p5min_00070_00001.h5 

!!! WARNING: this program is not ready for distribution.

data file: ../tiled-template/dev_sampler/nexus_punx/S2p5min_00070_00001.h5
NeXus definitions: main, dated 2023-06-26 08:57:16, sha=d669ffb453ed5a89ca746f8d440adc1b9a5ecc05

findings
======================================== ====== ============= =======================================
address                                  status test          comments                               
======================================== ====== ============= =======================================
/entry/Metadata                          WARN   validItemName NXcollection contains non-NeXus content
/entry/Metadata/AbsIntCoeff              WARN   validItemName NXcollection contains non-NeXus content
/entry/Metadata/AbsInt_Standard          WARN   validItemName NXcollection contains non-NeXus content
/entry/Metadata/Beam_x_pixel             WARN   validItemName NXcollection contains non-NeXus content
/entry/Metadata/Beam_y_pixel             WARN   validItemName NXcollection contains non-NeXus content
/entry/Metadata/CRL_A4                   WARN   validItemName NXcollection contains non-NeXus content
/entry/Metadata/CenterBS_gain            WARN   validItemName NXcollection contains non-NeXus content
/entry/Metadata/CenterBS_gainUnit        WARN   validItemName NXcollection contains non-NeXus content
/entry/Metadata/CenterBS_phd             WARN   validItemName NXcollection contains non-NeXus content
/entry/Metadata/DetZmotor                WARN   validItemName NXcollection contains non-NeXus content
/entry/Metadata/Detector_tilt_y          WARN   validItemName NXcollection contains non-NeXus content
/entry/Metadata/ESAFNumber               WARN   validItemName NXcollection contains non-NeXus content
/entry/Metadata/Energy                   WARN   validItemName NXcollection contains non-NeXus content
/entry/Metadata/EnergyThres1             WARN   validItemName NXcollection contains non-NeXus content
/entry/Metadata/ExposureTime             WARN   validItemName NXcollection contains non-NeXus content
/entry/Metadata/GISAXS_gain              WARN   validItemName NXcollection contains non-NeXus content
/entry/Metadata/GISAXS_gainUnit          WARN   validItemName NXcollection contains non-NeXus content
/entry/Metadata/GISAXS_phd               WARN   validItemName NXcollection contains non-NeXus content
/entry/Metadata/GIWAXS_gain              WARN   validItemName NXcollection contains non-NeXus content
/entry/Metadata/GIWAXS_gainUnit          WARN   validItemName NXcollection contains non-NeXus content
/entry/Metadata/GIWAXS_phd               WARN   validItemName NXcollection contains non-NeXus content
/entry/Metadata/GUPNumber                WARN   validItemName NXcollection contains non-NeXus content
/entry/Metadata/Heater_inUse             WARN   validItemName NXcollection contains non-NeXus content
/entry/Metadata/IC1_phd                  WARN   validItemName NXcollection contains non-NeXus content
/entry/Metadata/IC2_gain                 WARN   validItemName NXcollection contains non-NeXus content
/entry/Metadata/IC2_gainUnit             WARN   validItemName NXcollection contains non-NeXus content
/entry/Metadata/It_inUse                 WARN   validItemName NXcollection contains non-NeXus content
/entry/Metadata/It_phd                   WARN   validItemName NXcollection contains non-NeXus content
/entry/Metadata/Lakeshore_Control_Temp   WARN   validItemName NXcollection contains non-NeXus content
/entry/Metadata/Lakeshore_Loop1_SetPoint WARN   validItemName NXcollection contains non-NeXus content
/entry/Metadata/Lakeshore_Loop2_SetPoint WARN   validItemName NXcollection contains non-NeXus content
/entry/Metadata/Lakeshore_Sample_Temp    WARN   validItemName NXcollection contains non-NeXus content
/entry/Metadata/NDArrayEpicsTSSec        WARN   validItemName NXcollection contains non-NeXus content
/entry/Metadata/NDArrayEpicsTSnSec       WARN   validItemName NXcollection contains non-NeXus content
/entry/Metadata/NDArrayTimeStamp         WARN   validItemName NXcollection contains non-NeXus content
/entry/Metadata/NDArrayUniqueId          WARN   validItemName NXcollection contains non-NeXus content
/entry/Metadata/Q_Standard               WARN   validItemName NXcollection contains non-NeXus content
/entry/Metadata/SAXS_gain                WARN   validItemName NXcollection contains non-NeXus content
/entry/Metadata/SAXS_gainUnit            WARN   validItemName NXcollection contains non-NeXus content
/entry/Metadata/SAXS_phd                 WARN   validItemName NXcollection contains non-NeXus content
/entry/Metadata/SDD                      WARN   validItemName NXcollection contains non-NeXus content
/entry/Metadata/SRcurrent                WARN   validItemName NXcollection contains non-NeXus content
/entry/Metadata/Sample_DataName          WARN   validItemName NXcollection contains non-NeXus content
/entry/Metadata/Sample_Description       WARN   validItemName NXcollection contains non-NeXus content
/entry/Metadata/Sample_Name              WARN   validItemName NXcollection contains non-NeXus content
/entry/Metadata/Sample_Time              WARN   validItemName NXcollection contains non-NeXus content
/entry/Metadata/UserName                 WARN   validItemName NXcollection contains non-NeXus content
/entry/Metadata/Wavelength               WARN   validItemName NXcollection contains non-NeXus content
/entry/Metadata/hexH                     WARN   validItemName NXcollection contains non-NeXus content
/entry/Metadata/hexV                     WARN   validItemName NXcollection contains non-NeXus content
/entry/Metadata/linkam_ci94_errors       WARN   validItemName NXcollection contains non-NeXus content
/entry/Metadata/linkam_ci94_limit        WARN   validItemName NXcollection contains non-NeXus content
/entry/Metadata/linkam_ci94_rate         WARN   validItemName NXcollection contains non-NeXus content
/entry/Metadata/linkam_ci94_status       WARN   validItemName NXcollection contains non-NeXus content
/entry/Metadata/linkam_ci94_temp         WARN   validItemName NXcollection contains non-NeXus content
/entry/Metadata/linkam_ci94_temp2        WARN   validItemName NXcollection contains non-NeXus content
/entry/Metadata/monoE                    WARN   validItemName NXcollection contains non-NeXus content
/entry/Metadata/pinhH                    WARN   validItemName NXcollection contains non-NeXus content
/entry/Metadata/pinhV                    WARN   validItemName NXcollection contains non-NeXus content
/entry/Metadata/pixel_size               WARN   validItemName NXcollection contains non-NeXus content
/entry/Metadata/timestamp                WARN   validItemName NXcollection contains non-NeXus content
======================================== ====== ============= =======================================

summary statistics
======== ===== =========================================================== =========
status   count description                                                 (value)  
======== ===== =========================================================== =========
OK       620   meets NeXus specification                                   100      
NOTE     7     does not meet NeXus specification, but acceptable           75       
WARN     61    does not meet NeXus specification, not generally acceptable 25       
ERROR    0     violates NeXus specification                                -10000000
TODO     382   validation not implemented yet                              0        
UNUSED   0     optional NeXus item not used in data file                   0        
COMMENT  0     comment from the punx source code                           0        
OPTIONAL 220   allowed by NeXus specification, not identified              99       
         --                                                                         
TOTAL    1290                                                                       
======== ===== =========================================================== =========

<finding>=94.526432 of 908 items reviewed
NeXus definitions version: main
prjemian commented 10 months ago

Here's the assignment of the finding: https://github.com/prjemian/punx/blob/327192fb5ea0edf69699881c531ad4bc5c12b8d9/punx/validations/item_name.py#L172-L174

That code is called from the validator() method, which reports this table in its documentation: https://github.com/prjemian/punx/blob/327192fb5ea0edf69699881c531ad4bc5c12b8d9/punx/validations/item_name.py#L56-L63

I see inconsistency here in the punx documentation and possibly the assignment of the finding. Visually, these names appear to pass the Regular expression pattern for NXDL group and field names.

prjemian commented 10 months ago

What's the output with your file?

punx --version
punx validate --report WARN path/to/your/NeXus/file.h5
prjemian commented 10 months ago

Also, punx should not be reporting on content in NXcollection since it has these special rules:

        ignoreExtraGroups="true"
        ignoreExtraFields="true"
        ignoreExtraAttributes="true"

That is the real problem here. Sorry I did not see that earlier.

biochem-fan commented 10 months ago

Yes, it is acceptable. Too bad this is not in the list frequently asked questions. I'll fix that soon. The easiest reference to find in the NeXus User Guide is in the NeXus Class Definitions section.

I see. That is a relief.

Would you agree that the wording here seems a bit strong and could be relaxed a bit? Instead of "not generally acceptable", it could report "not generally recognized".

Yes, that would be clearer. Thank you very much.

punx --version

!!! WARNING: this program is not ready for distribution.
0.3.4

punx validate --report WARN path/to/your/NeXus/file.h5

!!! WARNING: this program is not ready for distribution.

data file: 377.nxs
NeXus definitions: main, dated 2023-06-26 08:57:16, sha=d669ffb453ed5a89ca746f8d440adc1b9a5ecc05

findings
============================================================================= ====== ============= =======================================
address                                                                       status test          comments
============================================================================= ====== ============= =======================================
/entry/instrument/detector/detectorSpecific                                   WARN   validItemName NXcollection contains non-NeXus content
/entry/instrument/detector/detectorSpecific/auto_summation                    WARN   validItemName NXcollection contains non-NeXus content
/entry/instrument/detector/detectorSpecific/calibration_type                  WARN   validItemName NXcollection contains non-NeXus content
/entry/instrument/detector/detectorSpecific/compression                       WARN   validItemName NXcollection contains non-NeXus content
/entry/instrument/detector/detectorSpecific/countrate_correction_bunch_mode   WARN   validItemName NXcollection contains non-NeXus content
/entry/instrument/detector/detectorSpecific/countrate_correction_count_cutoff WARN   validItemName NXcollection contains non-NeXus content
/entry/instrument/detector/detectorSpecific/data_collection_date              WARN   validItemName NXcollection contains non-NeXus content
/entry/instrument/detector/detectorSpecific/detector_readout_period           WARN   validItemName NXcollection contains non-NeXus content
/entry/instrument/detector/detectorSpecific/eiger_fw_version                  WARN   validItemName NXcollection contains non-NeXus content
/entry/instrument/detector/detectorSpecific/element                           WARN   validItemName NXcollection contains non-NeXus content
/entry/instrument/detector/detectorSpecific/flatfield                         WARN   validItemName NXcollection contains non-NeXus content
/entry/instrument/detector/detectorSpecific/frame_count_time                  WARN   validItemName NXcollection contains non-NeXus content
/entry/instrument/detector/detectorSpecific/frame_period                      WARN   validItemName NXcollection contains non-NeXus content
/entry/instrument/detector/detectorSpecific/module_bandwidth                  WARN   validItemName NXcollection contains non-NeXus content
/entry/instrument/detector/detectorSpecific/nframes_sum                       WARN   validItemName NXcollection contains non-NeXus content
/entry/instrument/detector/detectorSpecific/nimages                           WARN   validItemName NXcollection contains non-NeXus content
/entry/instrument/detector/detectorSpecific/nsequences                        WARN   validItemName NXcollection contains non-NeXus content
/entry/instrument/detector/detectorSpecific/ntrigger                          WARN   validItemName NXcollection contains non-NeXus content
/entry/instrument/detector/detectorSpecific/number_of_excluded_pixels         WARN   validItemName NXcollection contains non-NeXus content
/entry/instrument/detector/detectorSpecific/photon_energy                     WARN   validItemName NXcollection contains non-NeXus content
/entry/instrument/detector/detectorSpecific/pixel_mask                        WARN   validItemName NXcollection contains non-NeXus content
/entry/instrument/detector/detectorSpecific/roi_mode                          WARN   validItemName NXcollection contains non-NeXus content
/entry/instrument/detector/detectorSpecific/software_version                  WARN   validItemName NXcollection contains non-NeXus content
/entry/instrument/detector/detectorSpecific/summation_nimages                 WARN   validItemName NXcollection contains non-NeXus content
/entry/instrument/detector/detectorSpecific/test_mode                         WARN   validItemName NXcollection contains non-NeXus content
/entry/instrument/detector/detectorSpecific/trigger_mode                      WARN   validItemName NXcollection contains non-NeXus content
/entry/instrument/detector/detectorSpecific/x_pixels_in_detector              WARN   validItemName NXcollection contains non-NeXus content
/entry/instrument/detector/detectorSpecific/y_pixels_in_detector              WARN   validItemName NXcollection contains non-NeXus content
============================================================================= ====== ============= =======================================

summary statistics
======== ===== =========================================================== =========
status   count description                                                 (value)
======== ===== =========================================================== =========
OK       441   meets NeXus specification                                   100
NOTE     5     does not meet NeXus specification, but acceptable           75
WARN     28    does not meet NeXus specification, not generally acceptable 25
ERROR    2     violates NeXus specification                                -10000000
TODO     67    validation not implemented yet                              0
UNUSED   0     optional NeXus item not used in data file                   0
COMMENT  0     comment from the punx source code                           0
OPTIONAL 215   allowed by NeXus specification, not identified              99
         --
TOTAL    758
======== ===== =========================================================== =========

<finding>=-28847.380608 of 691 items reviewed
NeXus definitions version: main