Closed simongregorebner closed 8 years ago
Great, this is really useful. A few comments:
Can you change the heading from "Example" to "h5py Example", or something of that ilk. While most people seem to use the h5py interface, in principle that is only part of the library.
Can you add the line print h5py.__version__ # '2.X.Y'
(filling in X and Y). I'm not sure when h5py started supporting arbitrary filters in the high level interface, but it is recent and I have been using the low-level interface (see all the annoying code I had to write in bitshuffule/h5.pyx, which I guess is obsolete now).
I would stylize the create_dataset
line as follows, as the long line is difficult to read:
dataset = filehandle.create_dataset(
"data",
(100, 100, 100),
maxshape=(None, 100, 100),
compression=32008,
compression_opts=(block_size, bitshuffle.h5.H5_COMPRESS_LZ4),
chunks=(1,100,100),
dtype='float32',
)
If giving a minimal example, why specify maxshape
?
Why only fill the first chunk with data? Might as well use array = numpy.random.rand(100, 100, 100)
and dataset[:] = array
The h5py documentation generally uses the variable name f
for File
objects. It is less verbose, but note that an h5py.File
is not really a file handle.
Note that the 'official' recommendation is to set block_size = 0
and let Bitshuffle choose its value (which in your example comes out to be 2048 anyway). I would follow that, and add what the special value 0 means in a comment.
Please change 32008 to bitshuffle.h5.H5FILTER
.
If you have no objections, either you or I could make the proposed changes.
I updated the parts of the code you mentioned. Feel free to do further changes as you like...
Added a usage example to the readme as I found it difficult to figure out how to correctly use the library form python.