zarr-developers / zarr-python

An implementation of chunked, compressed, N-dimensional arrays for Python.
https://zarr.readthedocs.io
MIT License
1.54k stars 286 forks source link

Directly allow zarr arrays as "out" parameter of ufuncs (if possible) #566

Open s-m-e opened 4 years ago

s-m-e commented 4 years ago

I just ran into the following "problem":

import numpy as np
import zarr
a = np.arange(0, 16, dtype = 'float32').reshape(4, 4)
b = zarr.open('test.zarr', mode = 'w', shape = (4, 4), chunks = (2, 2), dtype = 'f4')
np.multiply(a, 2, out = b[:, :]) # does not change file(s) on disk
np.multiply(a, 2, out = b) # TypeError: return arrays must be of ArrayType

Some ufuncs (but not those from numpy itself) allow zarr arrays as out-parameters. I.e. the result of the computation is directly written to disk. numpy in fact rejects zarr arrays because they are not of "ArrayType". Could this be changed? I am not entirely sure given the nature of numpy's memory management.

alimanfoo commented 4 years ago

Thanks @s-m-e, this might be one to raise with the numpy folks, I don't know how easy it would be for numpy to allow zarr arrays as out parameters.

jakirkham commented 4 years ago

Maybe @rgommers would know? 🙂

rgommers commented 4 years ago

Some ufuncs (but not those from numpy itself) allow zarr arrays as out-parameters

Interesting. Can you point to an example @s-m-e? Other ufuncs, like in scipy.special, typically use the same numpy machinery as numpy itself to create ufuncs, so this is surprising to me.

ufuncs aren't going to be writing to disk, I think it requires ndarray or a subclass of it.

s-m-e commented 4 years ago

@rgommers I have been wondering about this, too. I have been digging through old code and so far I believe I have been "a victim" of Python functions "pretending" to be a (C-) ufunc and clever duck typing - i.e. something along those lines:

a = np.arange(10, 18, dtype = 'f4')
b = zarr.zeros((8,), chunks = (2,), dtype = 'f4')

isarray = lambda var: all((hasattr(var, attr) for attr in ('dtype', 'shape', 'ndim', 'astype')))
isscalar = lambda var: any((isinstance(var, number) for number in (int, float, np.number)))

def multiply_py(x, y, out = None, order = 'K', dtype = None):
    assert isarray(x)
    assert x.ndim == 1
    assert isscalar(y)
    assert out is None or isarray(out)
    if out is None:
        out = np.zeros(x.shape, order = order, dtype = dtype if dtype is not None else x.dtype)
    for i in range(x.shape[0]):
        out[i] = x[i] * y
    return out

multiply_py(a, 2, out = b)

I am trying to locate an example in an open source library.