Closed raymondEhlers closed 1 year ago
@jpivarski EmptyArray._remove_structure
currently returns []
. We could return [self]
, which would likely solve this bug, and probably be more useful too.
I don't remember what Content._remove_structure
means. Does the v1 version of it (effectively, translating from C++) return [self]
?
It's a rename of v2's Content._completely_flatten
with an extension to support keeping list dimensions (but making the outer dimensions all length 1). We use this for axis=None
reduction with keepdims=True
The suggestion to change its behaviour is not motivated by it being a bug inasmuch as "what would be more useful"?
The v1 completely_flatten
was in Python, in _util.py
.
And it had EmptyArrays (unknowntypes) go to an empty NumPy-like array (of bool_
for some reason[^1]):
So the new behavior of having _remove_structure
go to []
is definitely different and could be the cause of a bug that was not in v1. If _remove_structure
was returning NumPy-like arrays, EmptyArray should return an empty NumPy-like array, but I think you changed it to return Content
subclasses now, right? (To preserve option-types.)
[^1]: Maybe because bool_
combines with any other dtype as an identity? The concatenation of an array of bool_
and an array of T
returns an array of T
?
Thanks for digging!
EmptyArray should return an empty NumPy-like array, but I think you changed it to return Content subclasses now, right?
Yes, now this returns 1D contents, so it's legitimate for us to return EmptyArray
. In general, if we can avoid interpreting the EmptyArray
as some arbitrary NumpyArray
type, we preserve information, so that's why I'm in favour of such a solution here.
Version of Awkward Array
2.0.6
Description and code to reproduce
In some of my code, I frequently check that there are entries left in the array before trying to proceed to the next step. My use case is calling out to some c++ code to do jet finding event by event, so if there are no jets found, it eventually returns a full array of events, all of which are empty.
In awkward 1.x, I used
ak.flatten(array["data"].px, axis=None)
to flatten out a record (px is one of the fields, not calculated by vector) to check for entries (the particular record didn't really matter - I just needed one). In awkward 2.x, if I do the same, I get an assertion error:Some more info on the array:
It seems to be about the "unknown" nature of the type, since if I do the same with
pt
, vector calculates that field, which then types it withfloat64
, and the call works. Also note that if I flatten without axis=None, it works fine. A pickle with the jets array is attached regression_jets.pkl.zip (with .zip as the extension to allow it to be uploaded). I can in principle switch over topt
, but it would be nicer if this worked as in awkward 1.x . Thanks!edit: I know for sure that the behavior changed, but now I'm a bit less confident it is truly a regression as opposed to the possibility that it's an intended change in behavior. If so, I suppose please let me know, and I'll have to dig through my codebase to change it