Closed jpivarski closed 4 years ago
Putting here an example of the pyarrow behavior:
In [1]: import pyarrow as pa
In [2]: pa.array(range(5))
Out[2]:
<pyarrow.lib.Int64Array object at 0x112289c90>
[
0,
1,
2,
3,
4
]
In [3]: pa.array(range(5)).take(pa.array([1, None, 2]))
Out[3]:
<pyarrow.lib.Int64Array object at 0x1122dd130>
[
1,
null,
2
]
pyarrow doesn't support it, but a logical extension should also do this:
>>> py.array(range(5)).compress(py.array([False, True, None, None, True])
[
1,
null,
null,
4
]
Of course, "compress" is a terrible name, and pyarrow's compress function does the more logical thing: lossless compression. However, when these are used in __getitem__
without special names like take
and compress
, the above is what a user would expect.
Step 1 is done (in PR #111):
>>> ak.Array(range(5))[ak.Array([1, None, 2])]
<Array [1, None, 2] type='3 * ?int64'>
Step 2 is done (also in PR #111):
>>> ak.Array(range(5))[ak.Array([False, True, None, None, True])]
<Array [1, None, None, 4] type='4 * ?int64'>
And all the jagged slices:
>>> array = ak.Array([[1.1, 2.2, 3.3], [], [4.4, 5.5], [6.6], [7.7, 8.8, 9.9]])
>>> ak.tolist(array[[[0, -1], [], [], [0, 0, 0], [-1, -2, -3]]])
[[1.1, 3.3], [], [], [6.6, 6.6, 6.6], [9.9, 8.8, 7.7]]
>>> ak.tolist(array[[[0, None, -1], [None], [], [0, None, 0], [-1, -2, -3]]])
[[1.1, None, 3.3], [None], [], [6.6, None, 6.6], [9.9, 8.8, 7.7]]
>>> ak.tolist(array[[[0, -1], None, [], [], None, [0, 0, 0], [-1, -2, -3]]])
[[1.1, 3.3], None, [], [], None, [6.6, 6.6, 6.6], [9.9, 8.8, 7.7]]
>>> ak.tolist(array[[[0, None, -1], None, [None], [], None, [0, 0, 0], [-1, -2, -3]]])
[[1.1, None, 3.3], None, [None], [], None, [6.6, 6.6, 6.6], [9.9, 8.8, 7.7]]
And jagged mask (almost forgot the most important case!):
>>> ak.tolist(array[[[False, False, True], [], [True, True], [False], [True, False, True]]])
[[3.3], [], [4.4, 5.5], [], [7.7, 9.9]]
This can also have None
:
>>> ak.tolist(array[[[False, False, True], None, [], None, [True, True], [False], [True, False, True]]])
[[3.3], None, [], None, [4.4, 5.5], [], [7.7, 9.9]]
Getting None
values in the inner layer (correctly across jagged boundaries) was more difficult, but it's done now:
>>> ak.tolist(array[[[False, True, None], [None], [None, True], [False], [True, False, True]]])
[[2.2, None], [None], [None, 5.5], [], [7.7, 9.9]]
You can even do them at both levels. :)
>>> ak.tolist(array[[[False, True, None], None, [None], None, [None, True], [False], [True, False, True]]])
[[2.2, None], None, [None], None, [None, 5.5], [], [7.7, 9.9]]
So this issue is closed. The tests/test_PR111_jagged_and_masked_getitem.py
is much more extensive.
Relies upon #66.Follow
pyarrow.Array
's behavior for slicing with masked arrays (IndexedOptionArray
,BitMaskedArray
, and eventuallyByteMaskedArray
).Will need to extend
Slice
hierarchy and add jagged and masked cases toContent::getitem_*
.