Open jpivarski opened 1 year ago
I agree that this is a policy
question.
I'd vote in favour of not recursively zipping, because ak.zip
accepts useful parameters that might not apply to each call to ak.zip
identically, i.e. the user may well want different depth_limit
values. For simplicity and consistency, I'd prefer to require the user to call ak.zip
multiple times.
I'd vote in favour of not recursively zipping, because
ak.zip
accepts useful parameters that might not apply to each call toak.zip
identically, i.e. the user may well want differentdepth_limit
values. For simplicity and consistency, I'd prefer to require the user to callak.zip
multiple times.
I have no problem with this being the default behavior, but I'd love to see automatic recursive zipping as a feature (maybe as an optional argument to ak.zip
?). This is a pretty common use case in handling func-adl-uproot
queries.
In dask-contrib/dask-awkward/issues/213, @masonproffitt asked for
to work in dask-awkward as it does in Awkward. But
ak.zip
doesn't really do a nested zip; it just callsak.to_layout
on each of the values of the dict.https://github.com/scikit-hep/awkward/blob/6a24ed0d436bcd158f634d9bd9f6d664fff6bd2b/src/awkward/operations/ak_zip.py#L174-L190
For a nested dict (general, non-Awkward, non-ndarray container), that means it switches over into
ak.from_iter
, which (1) is slow, (2) ignores numeric types, and (3) doesn't zip: it makes the difference between an array of structs and a struct of arrays in the data type that you get back.whereas
You see all of the integer types turn into
int64
andfloat32
intofloat64
becauseak.from_iter
treats them as Pythonint
andfloat
, which loses dtype. You also see a different structure for the nested object.It's not obvious to me what the correct behavior is. Treating any expected array-like uniformly with
ak.to_layout
is good for consistency, but @masonproffitt's interpretation is natural, too.Originally posted by @jpivarski in https://github.com/dask-contrib/dask-awkward/issues/213#issuecomment-1497887851