scikit-hep / awkward

Manipulate JSON-like data with NumPy-like idioms.
https://awkward-array.org
BSD 3-Clause "New" or "Revised" License
838 stars 88 forks source link

`ak.to_arrow_table()` fails on `PartitionedArray`s #1093

Closed masonproffitt closed 2 years ago

masonproffitt commented 3 years ago

Version of Awkward Array

1.5.0

Description and code to reproduce

703 fixed ak.to_arrow() for PartitionedArray, but not ak.to_arrow_table(). For example, this part works fine:

>>> import uproot
>>> import awkward as ak
>>> array = uproot.lazy('scalars_tree_file.root:tree')['int_branch']
>>> array.layout
<IrregularlyPartitionedArray>
    <partition start="0" stop="2">
        <VirtualArray cache_key="aec71362-c2dd-11ea-91f6-3b01a8c0beef:/tree;1:int_branch(0):AsDtype(Bi4(),Li4()):0-2:ak">
            <ArrayGenerator f="<bound method TBranch.array of <TBranch 'int_branch' at 0x7f7edec57eb0>>" args="(None, 0, 2, <TrivialExecutor at 0x7f7edecc5b20>, <TrivialExecutor at 0x7f7edecc5550>, None, 'ak')">
                <length>2</length>
                <form>
                    {
                        "class": "NumpyArray",
                        "itemsize": 4,
                        "format": "i",
                        "primitive": "int32"
                    }
                </form>
            </ArrayGenerator>
            <ArrayCache mapping="<LRUArrayCache (32/100000000 bytes full) at 0x7..."/>
            <array><NumpyArray format="i" shape="2" data="0 -1" at="0x555df8251c90"/></array>
        </VirtualArray>
    </partition>
</IrregularlyPartitionedArray>
>>> ak.to_arrow(array)
<pyarrow.lib.ChunkedArray object at 0x7f7edecd1e00>
[
  [
    0,
    -1
  ]
]

But this doesn't work when going to a table:

>>> ak.to_arrow_table(array)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/user/miniconda3/envs/iris-hep/lib/python3.9/site-packages/awkward/operations/convert.py", line 2578, in to_arrow_table
    batch = pyarrow.RecordBatch.from_arrays(pa_arrays, schema=pyarrow.schema(pa_fields))
  File "pyarrow/table.pxi", line 1034, in pyarrow.lib.RecordBatch.from_arrays
TypeError: Cannot convert pyarrow.lib.ChunkedArray to pyarrow.lib.Array
masonproffitt commented 3 years ago

This issue seems to already be handled properly in to_parquet():

https://github.com/scikit-hep/awkward-1.0/blob/665a1870b0b05d1310881af583f4e05a3119bc15/src/awkward/operations/convert.py#L2947-L3067

It seems like most of this could be put into to_arrow_table and then to_parquet could just call to_arrow_table to get the table to write?

jpivarski commented 2 years ago

PartitionedArrays are being dropped in v2, in favor of Daskified Awkward Array.