ssl-hep / ServiceX_Uproot_Transformer

Transformer Image for Uproot-Based Transforms
BSD 3-Clause "New" or "Revised" License
1 stars 4 forks source link

Fix empty output for arrays that aren't in container objects #29

Closed masonproffitt closed 2 years ago

masonproffitt commented 2 years ago

This is a work around for https://github.com/scikit-hep/awkward-1.0/issues/1176.

jpivarski commented 2 years ago

I think I'll respond here instead of on the Awkward Array side. explode_records=True was a workaround for early versions of Arrow that couldn't write lists-of-records types to Parquet, so this function turned them into simple lists. Now Arrow can write this and many more types to Parquet (as of Arrow 2 or 3 or so; now they're on version 6). The explode_records option wasn't ported in Awkward version 2 (which requires Arrow 6 as a minimum, if you're using Arrow at all) because it's no longer needed.

So that function argument is going away and you should remove your dependence on it. The ak.to_parquet function only converts data between formats (Awkward Array and Parquet); it will no longer change its structure (exploding). The structure-changing operation could be its own function, but it's also a one-liner:

>>> array = ak.Array([[{"x": 1.1, "y": [1]}, {"x": 2.2, "y": [1, 2]}], [], [{"x": 3.3, "y": [1, 2, 3]}]])

>>> ak.Array({x: array[x] for x in ak.fields(array)})
<Array [{x: [1.1, 2.2], y: [, ... [1, 2, 3]]}] type='3 * {"x": var * float64, "y...'>

(array.fields also works in place of ak.fields(array). These are both v2-friendly. There was a shift to consolidate words used for recordlookup/keys/fields, so now it's just "fields". The high-level function has been ak.fields for a while.)

jpivarski commented 2 years ago

(Oh, and if you do the exploding in a one-liner, you get to check to see if array.fields is empty, too. If I were implementing this in Awkward, I wouldn't want it to depend on the emptiness of array.fields because it could legitimately be a record with no fields, so I'd want to recursively check for any RecordTypes, but that's probably overkill for your application. Anyway, this puts the power in your hands to decide what you want to do.)