Open samukweku opened 6 months ago
If you know what fields you want, why do you need a selector? Why not use a simple .select("Mango","Vodka")
? Or the existing cs.by_name("Mango","Vodka")
?
@aut0clave They want to extract the "range of columns" Mango .. Vodka
I believe first/last are the only selectors that are "positional"
>>> cs.first().meta.serialize()
'{"Nth":0}'
There is no .nth()
selector, but it would be easy to add:
>>> df.select( pl.Expr.deserialize( io.StringIO("""{"Nth":3}""") ) )
shape: (3, 1)
┌───────┐
│ Mango │
│ --- │
│ i64 │
╞═══════╡
│ 4 │
│ 10 │
│ 90 │
└───────┘
nth -> column name
mapping is done here:
From what I can tell, there is nothing that goes the other way, i.e. column name -> nth
- which I think would be needed in order to support this at the selector level?
@cmdlineluser i'd assume there was a way to get the positions of the column names (maybe grab the positions via list.index from python and pass it to the rust end). dont know much about the internal implementation, happy to learn. I'd also suggest, if the team feels like this is a worthwhile addition, that the slicing be limited to column names only (numeric positions should not be supported)
@cmdlineluser i'd assume there was a way to get the positions of the column names (maybe grab the positions via list.index from python and pass it to the rust end).
FYI: until we are actually evaluating a lazy query plan we may not know the position of all of the columns (eg: expanding a struct, or evaluating earlier selectors). Consequently we can't precompute and pass-down, because it's only at the lower level that we would know the answer (selectors are dynamic, evaluating internally at the point they are invoked) ;)
Offering index-based selection doesn't seem like a bad idea (we currently only support selection by name/dtype and the special cases of first/last, as noted by @cmdlineluser), but would need some internal additions to be possible 🤔
@cmdlineluser so something like cs.by_position
, cs.by_range
?
@alexander-beedie is the person to ask. (they created selectors
:-D)
@cmdlineluser so something like
cs.by_position
,cs.by_range
?
Probably cs.by_index
, which would take one or more index values, a range, or a slice (as range/slice can be directly expanded into a list of indexes, so internally we just need to handle that). Does need additional low-level support though.
FYI, forgot to update this issue, but we do now have a new cs.by_index
selector which can take indices and ranges, which gets you some of the way there: https://github.com/pola-rs/polars/pull/16217
Thanks @alexander-beedie. Looks good. Safe to assume that slicing with labels may be implemented at a future date?
Thanks @alexander-beedie. Looks good. Safe to assume that slicing with labels may be implemented at a future date?
Probably, but no timeline; the 1.0 (and a few quick point releases to address any related issues) has priority at the moment. And I'm on vacation for the next two weeks ;)
Description
Hi team. I would like to suggest adding a
slice
method to the selectors class, where users can select a slice of columns :The slicing syntax can be :