tidyverse / dplyr

dplyr: A grammar of data manipulation
https://dplyr.tidyverse.org/
Other
4.78k stars 2.12k forks source link

across(), but over two sets of vector #7056

Open abichat opened 4 months ago

abichat commented 4 months ago

Sometimes, when doing data wrangling, you need to apply functions not only on multiple columns, but also on multiple pairs of columns. This is what the {dplyover} package allows with its across2() and across2x() functions:

# remotes::install_github("TimTeaFan/dplyover")
library(dplyr)
library(dplyover)

iris %>% 
  group_by(Species) %>%
  summarise(across2x(starts_with("Sepal"), starts_with("Petal"), cor))
#> # A tibble: 3 × 5
#>   Species   Sepal.Length_Petal.L…¹ Sepal.Length_Petal.W…² Sepal.Width_Petal.Le…³
#>   <fct>                      <dbl>                  <dbl>                  <dbl>
#> 1 setosa                     0.267                  0.278                  0.178
#> 2 versicol…                  0.754                  0.546                  0.561
#> 3 virginica                  0.864                  0.281                  0.401
#> # ℹ abbreviated names: ¹​Sepal.Length_Petal.Length, ²​Sepal.Length_Petal.Width,
#> #   ³​Sepal.Width_Petal.Length
#> # ℹ 1 more variable: Sepal.Width_Petal.Width <dbl>

cor(iris[1:50,]$Sepal.Length, iris[1:50,]$Petal.Length)
#> [1] 0.2671758

Created on 2024-07-17 with reprex v2.1.0

The creation of the {dplyover} began with this issue, which –at that time– recommended putting these features in an adjacent package.

{dplyover} has really saved me and my colleagues a huge amount of time. However, it is not on CRAN, some bugs [1, 2] have been found, enhancement in the {dplyr} package are not currently supported, and the last commit is 3 years old.

Do you think these really useful functions could be included in the {dplyr} package? Thanks!