zarr-developers / zarr-python

An implementation of chunked, compressed, N-dimensional arrays for Python.
https://zarr.readthedocs.io
MIT License
1.51k stars 278 forks source link

How to store on disk an in-memory zarr #100

Closed mratsim closed 7 years ago

mratsim commented 7 years ago

Hi alimanfoo,

Great work on your library. I have a library (vtk for medical image visualization) that outputs a 3D numpy array. I want to store it on disk however it doesn't seem like there is a "convert", "save" or export function.

I tried using the following as well: data is a 3d numpy array outFile is path/to/my/image.zarr

z = zarr.array(data,
    **path=outFile**, dtype='int16',
    chunks=(100, 100, 100), compression='blosc', compression_opts=dict(cname='zstd'))

I get no error, z is created in memory, but nothing on disk.

For reference bcolz has a rootdir parameter that seems to be able to do what I want.

alimanfoo commented 7 years ago

Hi Mamy,

The easiest way to create a Zarr array using the filesystem for storage is to use the open_array function. There is an example in the tutorial section on persistent arrays.

If you have several arrays you want to store on disk, you might find it convenient to use groups. The easiest way to create a persistent group via the open_group function.

Both open_array and open_group are actually convenience function. An alternative way of creating on-disk arrays is to pass a DirectoryStore as the value of the store keyword argument to any of the array creation functions. E.g.::

import zarr
store = zarr.DirectoryStore('/path/to/data')
z = zarr.array(data, store=store, dtype='int16', chunks=(100, 100, 100), compressor=zarr.Blosc(cname='zstd'))

Hth.

mratsim commented 7 years ago

Thank you very much,

Your DirectoryStore example worked. Using open_array or open_group wouldn't work as easily as I would have to compute the proper shape to pass it as a parameter and then copy the NumPy array into the Zarr array.

Looking at the documentation, there is actually this example that I missed:

store = zarr.ZipStore('example.zip', mode='w')
z = zarr.zeros((1000, 1000), chunks=(100, 100), dtype='i4', store=store) 

but there is no example with exporting/writing an existing array to a DirectoryStore.

I'm creating a PR for that.