Closed masonproffitt closed 2 years ago
I think I'll respond here instead of on the Awkward Array side. explode_records=True
was a workaround for early versions of Arrow that couldn't write lists-of-records types to Parquet, so this function turned them into simple lists. Now Arrow can write this and many more types to Parquet (as of Arrow 2 or 3 or so; now they're on version 6). The explode_records
option wasn't ported in Awkward version 2 (which requires Arrow 6 as a minimum, if you're using Arrow at all) because it's no longer needed.
So that function argument is going away and you should remove your dependence on it. The ak.to_parquet
function only converts data between formats (Awkward Array and Parquet); it will no longer change its structure (exploding). The structure-changing operation could be its own function, but it's also a one-liner:
>>> array = ak.Array([[{"x": 1.1, "y": [1]}, {"x": 2.2, "y": [1, 2]}], [], [{"x": 3.3, "y": [1, 2, 3]}]])
>>> ak.Array({x: array[x] for x in ak.fields(array)})
<Array [{x: [1.1, 2.2], y: [, ... [1, 2, 3]]}] type='3 * {"x": var * float64, "y...'>
(array.fields
also works in place of ak.fields(array)
. These are both v2-friendly. There was a shift to consolidate words used for recordlookup/keys/fields, so now it's just "fields". The high-level function has been ak.fields
for a while.)
(Oh, and if you do the exploding in a one-liner, you get to check to see if array.fields
is empty, too. If I were implementing this in Awkward, I wouldn't want it to depend on the emptiness of array.fields
because it could legitimately be a record with no fields, so I'd want to recursively check for any RecordTypes, but that's probably overkill for your application. Anyway, this puts the power in your hands to decide what you want to do.)
This is a work around for https://github.com/scikit-hep/awkward-1.0/issues/1176.