Closed jpivarski closed 1 year ago
Thanks for raising this @jpivarski-- I'll start working on expanding __getitem__
support.
I'll just point out, though, that series[] will not be able to support this, since pandas handles the input before passing it down to the extension array. We already have series.values.ak[...]
, which is I think exactly what you are after.
series.values.ak[...]
would be fine, if documented.
We already have series.values.ak[...]
I think you mean series.ak[...]
:)
er, that might be it :)
Yea after looking at this code again for the first time in a number of weeks, I now remember the extension array's __getitem__
implementation is there to support common Series workflows (i.e. pd.Series is going to call down to it as mentioned by Martin), while we do want to encourage series.ak[...]
over series.values[...]
for more awkward style workflows. I think this makes it clear that it's time to start a user guide
Not sure if this needs to be a separate PR, but the AwkwardExtensionArray.__repr__
, inherited from the superclass, is making as many Awkward Arrays as there are rows (or visible rows?) which would be expensive if it really is all rows, but is hard to read due to the type information in either case. If it's just a cosmetics issue and not a performance issue, printing the rows with str
instead of repr
would fix it.
https://github.com/intake/awkward-pandas/pull/16 was an attempt to deal with that, we are aware that we need something better.
Quick start docs which will hopefully make clear the difference between acting on the Series and the underlying array are starting to form here: https://awkward-pandas.readthedocs.io/en/latest/quickstart.html
The
AwkwardExtensionArray.__getitem__
method is opinionated about the types that it gets; it would be more powerful if it passes any slicer through to the underlying Awkward Array and lets that code succeed or fail, depending on slicer type.We want to be able to take a Pandas Series that contains an Awkward Array,
series
, and sayor
and have that pass the slice
"x"
or[2, 0, 0, 1]
to the Awkward Array. (TheSince the
__getitem__
checks types,https://github.com/intake/awkward-pandas/blob/4c1020c4d88ebccc8c2bfaad69c3b01b0aba3bac/src/awkward_pandas/array.py#L60-L67
both of these fail. (The
[2, 0, 0, 1]
slice would succeed if we wrapped it as annp.ndarray
.)This came up in scikit-hep/uproot5#734, editing test_0033-more-interpretations-2.py#L82 to be
result.values["x"].tolist()
, whereresult
is a Series.The strategy of
__getitem__
can be to pass whateveritem
it gets as a slice onself._data
, then checkself._data
to see if it's an ak.Array, an ak.Record, or other, and wrapping ak.Array or ak.Record as a new AwkwardExtensionArray. The scalar output types ("other") can be returned as-is.Cc: @kkothari2001