tidyverse / tidyr

Tidy Messy Data
https://tidyr.tidyverse.org/
Other
1.38k stars 418 forks source link

num_range() with multiple-specified prefixes gives alternating result #1552

Closed t-pollington closed 2 months ago

t-pollington commented 4 months ago

I want to select all columns M1, F1, M2 & F2. If I specify this using a single use of num_range() it unexpectedly selects just two columns. Instead I would expect four. I show the expected result using starts_with() but of course I'd prefer num_range as I can be explicit about the numbering too.

library(tidyverse); 
library(magrittr); 
#> 
#> Attaching package: 'magrittr'
#> The following object is masked from 'package:purrr':
#> 
#>     set_names
#> The following object is masked from 'package:tidyr':
#> 
#>     extract
df <- data.frame(M1=1, F1=3, M2 = 5, F2 = 0); 
df %>% select(num_range(c("M", "F"), 1:2)); 
#>   M1 F2
#> 1  1  0
df %>% select(starts_with(c("M", "F")))
#>   M1 M2 F1 F2
#> 1  1  5  3  0

Created on 2024-05-29 with reprex v2.1.0

DavisVaughan commented 2 months ago

The behavior of num_range() is to recycle prefix and range against each other, although that is not documented that well. I've opened a tidyselect issue for this. You could generate the combinations yourself and then use starts_with()

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

df <- data.frame(M1=1, F1=3, M2 = 5, F2 = 0)

combinations <- vctrs::vec_expand_grid(
  name = c("M", "F"),
  index = 1:2
)
names <- paste0(combinations$name, combinations$index)
names
#> [1] "M1" "M2" "F1" "F2"

df %>% select(starts_with(names))
#>   M1 M2 F1 F2
#> 1  1  5  3  0

Created on 2024-07-27 with reprex v2.0.2