scikit-hep / awkward

Manipulate JSON-like data with NumPy-like idioms.
https://awkward-array.org
BSD 3-Clause "New" or "Revised" License
839 stars 89 forks source link

Any handling of non-canonical layout combinations can be removed #1924

Open jpivarski opened 1 year ago

jpivarski commented 1 year ago

After #1910, any handling of non-canonical layouts ("level 1") will become dead code, uncoverable, because we won't be able to make those non-canonical layouts anymore. (Good!) So this issue is asking to clean them up, replacing complex implementations with simple ones.

For example, is_none does this:

https://github.com/scikit-hep/awkward/blob/3edac9b342ea13d4ede6ff9f541da60aee72bb2b/src/awkward/operations/ak_is_none.py#L33-L59

when all you need (for canonical layouts) is this:

https://github.com/scikit-hep/awkward/blob/500e0dd06fa4aa542ac0226da24851fb730e5042/src/awkward/operations/structure.py#L2857-L2875

It was implemented because of PR #1249, which is issue #1193, which was motivated by this layout:

>>> index_of_index = ak.Array(
...     ak.layout.IndexedOptionArray64(
...         ak.layout.Index64(np.r_[0, 1, 2, 3]),
...         ak.layout.IndexedOptionArray64(
...             ak.layout.Index64(np.r_[0, -1, 2, 3]),
...             ak.layout.NumpyArray(np.r_[1, 2, 3, 4]),
...         ),
...     )
... )
>>> index_of_index
<Array [1, None, 5, 6] type='4 * ?union[?int64, ?int64]'>
>>> ak.is_none(index_of_index)
<Array [False, False, False, False] type='4 * bool'>

which is non-canonical and not allowed.

jpivarski commented 9 months ago

It might be possible to detect these situations through coverage tests.