scikit-hep / awkward-0.x

Manipulate arrays of complex data structures as easily as Numpy.
BSD 3-Clause "New" or "Revised" License
215 stars 39 forks source link

Reduction of empty elements #237

Closed mverzett closed 4 years ago

mverzett commented 4 years ago

As you might have guessed I'm currently dealing with a lot of empty elements and I'm trying to find the best way to handle them. While trying few things I stumbled across this behaviour, probably nan would be more appropriate in this case

>>> import awkward
>>> a = awkward.fromiter([[1,2], [3,4], [5,], [6,7]])
>>> b = awkward.fromiter([[1],[],[7],[8]])
>>> l, r = a.cross(b, nested=True).unzip()
>>> import numpy as np
>>> diff = np.abs(l - r)
>>> diff
<JaggedArray [[[0] [1]] [[] []] [[2]] [[2] [1]]] at 0x7fe9dab4d290>
>>> diff.min.__doc__
>>> diff.min()
<JaggedArray [[0 1] [9223372036854775807 9223372036854775807] [2] [2 1]] at 0x7fe9e6dfa1d0>
jpivarski commented 4 years ago

Actually, the identity of min is inf, but inf, -inf, and nan are only possible values for floating point, and you have integers. Therefore, we have to use the maximum possible integer for this type.

It's ugly, and I'm changing it. In the future, min and max will be "semigroup reducers," without an identity element, and these will show up as None. (It will be a masked array with None covering up the extreme value, whether that's inf or the largest possible integer.)

mverzett commented 4 years ago

Yeah, makes sense, thanks!