scikit-hep / awkward

Manipulate JSON-like data with NumPy-like idioms.
https://awkward-array.org
BSD 3-Clause "New" or "Revised" License
813 stars 82 forks source link

unexpected behaviour of ak.where with arrays containing Nones #3098

Closed maadcoen closed 1 week ago

maadcoen commented 2 months ago

Version of Awkward Array

1.10.5

Description and code to reproduce

I ran into this weird behaviour of ak.where when used on arrays of mixed datatype (not sure about the terminology, but I mean arrays potentially containing None values, as indicated by the question mark).

Basically, the function breaks down when the boolean condition is of mixed datatype and the choices contain a None value. In that case, the output will be None, even when the boolean condition doesn't contain any Nones at all and the None value shouldn't have been selected. For the choice arrays, the problem only arises when one of them contains a None indeed. For the boolean arrays, it is sufficient that the datatype is mixed. See the code below to reproduce this behaviour.

mixed_type_cond = ak.Array([[True], [None]])[0]
pure_type_cond = ak.Array([True])
none_alternative = ak.Array([None])
zero_alternative = ak.Array([0])
mixed_zero_alternative = ak.Array([[0], [None]])[0]

ak.where(mixed_type_cond, 1, none_alternative)
>>> <Array [None] type='1 * ?int64'>
ak.where(mixed_type_cond, 1, zero_alternative)
>>> <Array [1] type='1 * ?int64'>
ak.where(mixed_type_cond, 1, mixed_zero_alternative)
>>> <Array [1] type='1 * ?int64'>

ak.where(pure_type_cond, 1, none_alternative)
>>> <Array [1] type='1 * ?int64'>
ak.where(pure_type_cond, 1, zero_alternative)
>>> <Array [1] type='1 * ?int64'>
ak.where(pure_type_cond, 1, mixed_zero_alternative)
>>> <Array [1] type='1 * ?int64'>

ak.where(~mixed_type_cond, none_alternative, 1)
>>> <Array [None] type='1 * ?int64'>
ak.where(~mixed_type_cond, zero_alternative, 1)
>>> <Array [1] type='1 * ?int64'>
ak.where(~mixed_type_cond, mixed_zero_alternative, 1)
>>> <Array [1] type='1 * ?int64'>
tcawlfield commented 1 month ago

I added a unit test and did some preliminary characterization. See https://github.com/scikit-hep/awkward/blob/9c66fb23e436d5631b794ba7c5e3f96b181aebc4/tests/test_3098_ak_where_with_arrays_containing_optionals.py.