experimental for nested

@jpivarski : the query function here works on the play data generated by nested-pandas in 10x the speed compared to the typical approach we discussed, even with the UnmaskedArray PR.

Generate the play data:

from nested_pandas.datasets import generate_data
import awkward as ak
import akimbo.pandas
import akimbo.exp  # this PR, experimental

nf = generate_data(1000, 10000)  # 10 rows, 100 nested rows per row
arr = nf.ak.array
arr2 = akimbo.exp.rec_list_swap(arr, "nested")  # to list-of-records

Times:

%timeit nf_g = nf.query("nested.t > 17.0");
83.8 ms ± 351 µs
%timeit arr["nested"][arr["nested", "t"] > 17]
183 ms ± 1.56 ms
%timeit akimbo.exp.query(arr2, "nested.t > 17")
23.2 ms ± 568 µs

Note that here we make a masked array, so it has exactly the same structure as the original (swapped) array, but where the filter fails, you get None. Else you would need ak.count, which takes about 50ms.

It feels like it should be possible to do this really efficiently with ArrayBuilder and numba? You would need to have a way to turn the "query" into something you can execute in the loop.

intake / akimbo

experimental for nested #79