loading scalar values into a dictionary from hdf5 groups

uchicago-cs / deepdish

Flexible HDF5 saving/loading and other data science tools from the University of Chicago

BSD 3-Clause "New" or "Revised" License

270 stars 59 forks source link

First, I would like to thank you for writing such a nice package for interfacing with hdf5 files.

I have been using hdf5 files to store data, and I realized that I could use your package to load data from hdf5 files that weren't created by your package. The only issue in loading the files was the need to load scalar values into a dictionary that are stored as hdf5 groups. It appears deepdish io package expects scalars to be stored as numpy arrays of of length 1. However, I added a couple of lines to the '_load_nonlink_level' method so that if there is a scalar instead of an array, return the value of the scalar.

I don't think your code needs to be modified. But, I was thinking that other people might find the tweak / mod useful if they have hdf5 created in a different way; but would like to use your package to load their files. I've attached the 'hdf5io.py' that I modified. The lines I added are at line 439.

hdf5io.py.txt

Scalars are stored by deepdish as attributes in HDF5. This includes Python regular scalars (e.g. 100), Numpy scalars (e.g. np.int32(100)), and Numpy array scalars (e.g. np.array(100)). They will all be read as a Numpy scalar. So, as far as I can tell, deepdish never stores arrays of shape (), so I see no problem in adding this behavior to unpack them to deepdish. Reading non-deepdish HDF5 well is a goal of deepdish.

I just noticed one thing I haven't considered until now. If you store a very large integer (>64 bits), then it will be stored as a np.array(1234234..., dtype=object). This is actually a numpy array of shape (), so I take back what I said. Fixing this would also fix the big integer problem. Thanks for pointing this out and offering a solution. I will push a commit with your suggested fix (unless you want to submit it as a PR).

uchicago-cs / deepdish

loading scalar values into a dictionary from hdf5 groups #19