Open cpcloud opened 2 weeks ago
I can understand wanting the built in relocate
functionality as a nice shorthand.
Would we attempt to also allow this style of ordering when the c
selector is combined with others?
>>> t.select(s.c("body_mass_g") | s.numeric())
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━┓
┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ year ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━┩
│ float64 │ float64 │ int64 │ int64 │ int64 │
├────────────────┼───────────────┼───────────────────┼─────────────┼───────┤
...
Should that bump body_mass_g
to the front?
I'm going to see what dplyr does.
It looks like dplyr takes an eager approach and short circuits the selector match, resulting in behavior where the output order follows the first matching selector:
> library(dplyr)
> t <- as_tibble(data.frame(x=c('a'), y=c('b')))
> t
# A tibble: 1 × 2
x y
<chr> <chr>
1 a b
c(y, x)
> t |> select(c(y, x))
# A tibble: 1 × 2
y x
<chr> <chr>
1 b a
This output matches the selector order
!where(is.numeric)
!where(is.numeric)
matches both columns first, so the order is preserved.
> t |> select(!where(is.numeric) | c(y, x))
# A tibble: 1 × 2
x y
<chr> <chr>
1 a b
I think @jcrist had some thoughts on this one
Right now, no matter what kind of selector is being used, the expanded columns are in the order of the table they are matched against.
For selectors where no obvious column order exists when they are specified, which I believe is most of them, this make sense.
I'm wondering whether we should try to preserve order when the
c
selector is in the mix.Here's an example of what currently happens:
This came up during writing TPC-DS query 18 where I want to select across
c
in theagg
method, and the specification of the columns I gave was different from the input table, so the output wasn't what I expected.Let's avoid discussing implementation until we decide we want to do this.