mean, std fail on ChunkedArrays

scikit-hep / awkward-0.x

Manipulate arrays of complex data structures as easily as Numpy.

BSD 3-Clause "New" or "Revised" License

215 stars 39 forks source link

>>> import awkward >>> import numpy as np >>> chunks = [ ... np.arange(5), ... np.arange(5), ... np.arange(5) ... ] a = awkward.ChunkedArray(chunks, [5, 5, 5]) >>> a.mean() Traceback (most recent call last): File "<stdin>", line 1, in <module> File ".../venv/lib64/python3.6/site-packages/awkward/array/base.py", line 223, in mean return self.numpy.true_divide(self.sum(), self.count()) File ".../venv/lib64/python3.6/site-packages/awkward/array/base.py", line 196, in count return self._reduce(None, 0, None) File ".../venv/lib64/python3.6/site-packages/awkward/array/chunked.py", line 698, in _reduce this = self._util_reduce(chunk[:self._chunksizes[chunkid]], ufunc, identity, dtype) File ".../venv/lib64/python3.6/site-packages/awkward/array/base.py", line 554, in _util_reduce return ufunc.reduce(array, axis=None) AttributeError: 'NoneType' object has no attribute 'reduce' >>> b = awkward.fromiter(np.arange(15)) >>> b.mean() 7.0

>>> np.mean(b) 7.0 >>> np.mean(a) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<__array_function__ internals>", line 6, in mean File ".../venv/lib64/python3.6/site-packages/numpy/core/fromnumeric.py", line 3370, in mean return mean(axis=axis, dtype=dtype, out=out, **kwargs) TypeError: mean() got an unexpected keyword argument 'axis'

We should be trailing off the use of Awkward 0, especially for ChunkedArrays, which were very hard to get right in Awkward 0 (lots of NumPy-based edge cases).

Just verifying that all of this is fine in Awkward 1 (see ak.partitioned for this array-creation routine):

>>> import awkward1 as ak
>>> import numpy as np
>>> a = ak.partitioned(lambda n: ak.Array(np.arange(5)), 3)
>>> a
<Array [0, 1, 2, 3, 4, 0, ... 4, 0, 1, 2, 3, 4] type='15 * int64'>
>>> np.mean(a)
2.0

And if the original array came from Awkward 0, use ak.from_awkward0:

>>> import awkward
>>> chunks = [
...     np.arange(5),
...     np.arange(5),
...     np.arange(5)
... ]
>>> a = awkward.ChunkedArray(chunks, [5, 5, 5])
>>> a
<ChunkedArray [0 1 2 ... 2 3 4] at 0x7f3622f65f10>
>>> a1 = ak.from_awkward0(a)
>>> a1
<Array [0, 1, 2, 3, 4, 0, ... 4, 0, 1, 2, 3, 4] type='15 * int64'>
>>> np.mean(a1)
2.0

Note: there's a new restriction that chunking is only allowed at the top-level of a data structure (and hence it has been renamed to "partitioning"). This is in part to reduce complexity of implementation, but also because the value of having an array in partitions is so that different jobs can work on different pieces or the pieces can be loaded one at a time—i.e. either for compute efficiency or for staying within memory limitations—and both of these only need chunking at top-level. If the default keeplayout=False in ak.from_awkward0, deeply nested ChunkedArrays will be concatenated; otherwise, they'd raise an error.

Note on the note: we'll be changing the name of keeplayout to keep_layout (scikit-hep/awkward-1.0#310). This is an oversight in adapting to cleaner names.

scikit-hep / awkward-0.x

mean, std fail on ChunkedArrays #249