uchicago-cs / deepdish

Flexible HDF5 saving/loading and other data science tools from the University of Chicago
http://deepdish.io
BSD 3-Clause "New" or "Revised" License
271 stars 60 forks source link

Appending values to a numpy array in an HDF5 file created be deepdish.io #22

Open Omer80 opened 8 years ago

Omer80 commented 8 years ago

Hi, I like deepdish.io interface to HDF5, and would like to use it to store numpy arrays on the go, during a simulation running. Is it possible to append values to a certain numpy array that I've saved?

That is, I would like to create a dictionary of this sort - {u1:numpy_array1,u2:numpy_array2}, in which numpy_array is, for example, (128 * 128) array. I would like to append another (128 * 128 ) array to a third axis in each step of iteration of my code..

Shaunakde commented 8 years ago

So far I have got about this by doing an expensive extract, append and write cycle. I will look into this tonight - might help both of us.

SD

gustavla commented 8 years ago

This is currently not supported, however the HDF5 standard does make this possible using chunking. That is, an array is stored as separate chunks, so when extending the array, the memory does not have to be contiguous in the file (because that would require a full re-allocation of the file). I know the backend PyTables can set chunking, so I think deepdish could support this. This feature request has come up before and I think it would be really neat, so I will consider this as high priority.

If I don't get to this soon, anyone is welcome to propose a PR.

There are some design decisions that need to be made first. The chunking size needs to be known the first time you save the file. Either deepdish sets it to something appropriate, or the user explicitly specifies a chunk size. What is the API for extending an array? Perhaps a separate call altogether, like dd.io.extend('test.h5', '/foo', x, axis=0)?

Omer80 commented 8 years ago

I like the idea of creating a separate call for extending an array..

Omer80 commented 7 years ago

Hi guys, I wanted to inquire whether there's any update regarding the option of appending data to a numpy array inside a HDF5 file created with deepdish.io ..