prjemian / punx

Python Utilities for NeXus HDF5 files
https://prjemian.github.io/punx
5 stars 7 forks source link

Is this dependent on the h5py version? #133

Closed prjemian closed 2 years ago

prjemian commented 3 years ago

While testing with the BES XPCS working group, discovered that strings written with create_dataset() were of type "|O" instead of "|Snn" where nn is the length. Is that part of the change with h5py v3? By uninstalling that version, then installing v2.10.0, the test passed. This is a schematic of the test:

writer:

   h5root.create_dataset("item_name", "the value")

tester:

    assert h5root["item_name"].value == "the value"
prjemian commented 3 years ago

What's new in h5py? See https://docs.h5py.org/en/latest/whatsnew/index.html

prjemian commented 2 years ago

Version installed today:

(bluesky_2021_2) prjemian@zap:~/.../prjemian/punx$ conda list h5py
# packages in environment at /home/prjemian/.conda/envs/bluesky_2021_2:
#
# Name                    Version                   Build  Channel
h5py                      3.2.1            py38h6c542dc_0  

Could build a test centered on this snippet:

h5root.create_dataset("item_name", "the value")
assert h5root["item_name"].value == "the value"

and modify the environment: https://github.com/prjemian/punx/blob/ca5b0212cf0cfe6ef9da46ddf0a902033d32b882/environment.yml#L8

for

    - h5py>=3
prjemian commented 2 years ago

Could also add a unit test that the environment keeps this additional restriction. Might be harder to keep that test in sync with future upgrades.

prjemian commented 2 years ago

Also, create a custom conda environment with h5py v2, then create a test HDF5 file with that version.

(h5py2) prjemian@opp:~$ python
Python 3.8.12 (default, Oct 12 2021, 13:49:34) 
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import h5py
>>> h5py.__version__
'2.10.0'
>>> with h5py.File("issue133_h5py2.h5", "w") as root:
...     root.create_dataset("item_name", data="the value")
... 
<HDF5 dataset "item_name": shape (), type "|O">
prjemian commented 2 years ago

Looks the same with h5py v3:

(bluesky_2021_2) prjemian@zap:/tmp$ python
Python 3.8.10 (default, May 19 2021, 18:05:58) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import h5py
>>> h5py.__version__
'3.2.1'
>>> with h5py.File("issue133_h5py3.h5", "w") as root:
...     root.create_dataset("item_name", data="the value")
... 
<HDF5 dataset "item_name": shape (), type "|O">
>>> exit()

will test both.

prjemian commented 2 years ago

Without need for further unit test, the h5dump comparison looks identical:

(bluesky_2021_2) prjemian@zap:~/.../prjemian/punx$ h5dump issue133_h5py3.h5
HDF5 "issue133_h5py3.h5" {
GROUP "/" {
   DATASET "item_name" {
      DATATYPE  H5T_STRING {
         STRSIZE H5T_VARIABLE;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_UTF8;
         CTYPE H5T_C_S1;
      }
      DATASPACE  SCALAR
      DATA {
      (0): "the value"
      }
   }
}
}
(bluesky_2021_2) prjemian@zap:~/.../prjemian/punx$ h5dump /tmp/issue133_h5py2.h5
HDF5 "/tmp/issue133_h5py2.h5" {
GROUP "/" {
   DATASET "item_name" {
      DATATYPE  H5T_STRING {
         STRSIZE H5T_VARIABLE;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_UTF8;
         CTYPE H5T_C_S1;
      }
      DATASPACE  SCALAR
      DATA {
      (0): "the value"
      }
   }
}
}

Not adding these to the unit tests.