scikit-hep / awkward-0.x

Manipulate arrays of complex data structures as easily as Numpy.
BSD 3-Clause "New" or "Revised" License
215 stars 39 forks source link

IndexedMaskedArray corner cases #217

Open nsmith- opened 4 years ago

nsmith- commented 4 years ago
>>> import awkward as ak
>>> ak.__version__
'0.12.17'
>>> a = ak.fromiter([[1, 2], [], [-2, 4], [4, -3], [-2, -1]])
>>> b = ak.fromiter([[2, 4], [1], [-1, 1], [], [-3, -4]])
>>> af = a[a.argmax()].pad(1, clip=True).flatten()
>>> print(af)
[2 None 4 4 -1]
>>> print(b)
[[2 4] [1] [-1 1] [] [-3 -4]]

defines two arrays, one a flat masked array, and one a jagged array.

>>> b + af
...
AttributeError: no column named 'reshape'
>>> af + b
<IndexedMaskedArray [[4 6] None [3 5] [] [-4 -5]] at 0x00011729b9d0>

ufuncs depend on order of operation--the jagged array doesn't let masked array pass through.

>>> idx = (af + b).argmin()
>>> idx
<IndexedMaskedArray [[0] None [0] [] [1]] at 0x000117dca950>

defines what should be a valid index into b, however

>>> b[idx]
...
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

A workaround (implementation?) is

>>> idx.copy(content=b[~idx.boolmask()][idx.content])
<IndexedMaskedArray [[2] None [-1] [] [-4]] at 0x000117dcac90>

Related, I wish that af.boolmask() was instead af.mask an af.mask goes private, as it depends on the mask implementation. Similar to how af.counts is a universal property. Also, a shorthand for af[~af.boolmask()].content would be nice.