Closed chuyuanliu closed 8 months ago
Thanks! With a change in speed like that, you've probably found a case in which the Awkward Array was converted into Python objects with to_list
and then converted back with from_iter
. That's not supposed to ever happen, but a 2500× speedup is suggestive that it did happen here.
Since you've also solved the issue, this should be a PR, and I got one started for you in #49.
Hi,
I am trying to concatenate multiple dataframes using
pandas.concat()
. When there are columns of awkward series, this process seems to be extremely slow.The packages I am using are
I dig a little bit into the source code and find these lines https://github.com/intake/awkward-pandas/blob/d0f789388a9a0517c4c7c722bd7f3656910b5260/src/awkward_pandas/array.py#L130-L132 Looks like this code is actually call
ak.concatenate
onpandas.Series
instead of the raw array. A fix works for me is to do something like:Tested on the following sample
the last line takes about ~500s without the fix and ~0.2s with the fix.