perfromance slowdown using across within mutate

I believe this is an unexplored performance issue, seemingly relating to dplyr::expand_across

Benchmarked over a 1000 repetitions of processing ames data; There is a marked difference between direct mutation, and indirect mutation faciliated by across , seemingly both when using where() selection, and explicit all_of(c(..)) style selection. The latter speed degredation (of direct listing through all_of(c(...)) I think shows that the issue wont be related to checking properties a la the where() instant.

I think the performance issue is significant, with direct mutation approx 3x faster than that mediated by across

# A tibble: 4 × 9
  expression                                             min   median *`itr/sec`* mem_alloc `gc/sec` n_itr  n_gc total_time
  <bch:expr>                                        <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm>
1 acrosswhere_func(ames_narrow)                       3.69ms   4.26ms      *219.*    1.75MB     7.70   966    34      4.42s
2 across_all_of_func(ames_narrow)                      3.3ms   3.83ms      *256.*   64.73KB     8.20   969    31      3.78s
3 direct_mutate_func(ames_narrow)                      1.1ms   1.26ms      *766.*   48.59KB     8.52   989    11      1.29s
4 direct_mutate_with_class_detect_func(ames_narrow)   1.22ms   1.36ms      *722.*   71.12KB     8.77   988    12      1.37s

I came across and considered whether this was related to #6897; however I believe it is something else. Here when using across I use the anonymous function syntax as advised.

first a reprex and then my session info...

library(bench)
library(tidyverse)
library(modeldata)
options("lifecycle_verbosity"="error")

(ames_narrow <- ames |> select(1:5))

num_op <- mean
char_op <- identity

acrosswhere_func <- function(a){
  mutate(a,
         across(where(is.numeric),\(x){num_op(x)}),
         across(where(is.character)|where(is.factor),\(x){char_op(x)}))
}

across_all_of_func <- function(a){
  mutate(a,
         across(all_of(c("Lot_Frontage","Lot_Area")),\(x){num_op(x)}),
         across(all_of(c("MS_SubClass","MS_Zoning","Street")),\(x){char_op(x)}))
}

direct_mutate_func <- function(a){
  mutate(a,
         Lot_Frontage = num_op(Lot_Frontage),
         Lot_Area = num_op(Lot_Area),
         MS_SubClass =  char_op(MS_SubClass),
         MS_Zoning =  char_op(MS_Zoning),
         Street = char_op(Street))
}

direct_mutate_with_class_detect_func <- function(a){

  l <- map_lgl(a,\(x)is.numeric(x))
  numnames <- names(l[l])
  l <- map_lgl(a,\(x){is.character(x)|is.factor(x)})
  catnames <- names(l[l])

  mutate(a,
         Lot_Frontage = num_op(Lot_Frontage),
         Lot_Area = num_op(Lot_Area),
         MS_SubClass =  char_op(MS_SubClass),
         MS_Zoning =  char_op(MS_Zoning),
         Street = char_op(Street))
}

b1 <- mark(acrosswhere_func(ames_narrow),
           across_all_of_func(ames_narrow),
           direct_mutate_func(ames_narrow),
           direct_mutate_with_class_detect_func(ames_narrow),iterations = 1000L)

select(b1,1:9)

session info

R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.utf8  LC_CTYPE=English_United Kingdom.utf8   
[3] LC_MONETARY=English_United Kingdom.utf8 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] modeldata_1.2.0 lubridate_1.9.3 forcats_1.0.0   stringr_1.5.1   dplyr_1.1.4     purrr_1.0.2     readr_2.1.5    
 [8] tidyr_1.3.0     tibble_3.2.1    ggplot2_3.4.4   tidyverse_2.0.0 bench_1.1.3    

loaded via a namespace (and not attached):
 [1] rstudioapi_0.15.0 magrittr_2.0.3    hms_1.1.3         tidyselect_1.2.0  munsell_0.5.0     timechange_0.2.0 
 [7] colorspace_2.1-0  R6_2.5.1          rlang_1.1.3       fansi_1.0.4       tools_4.2.2       grid_4.2.2       
[13] gtable_0.3.4      utf8_1.2.3        cli_3.6.2         withr_2.5.0       lifecycle_1.0.3   tzdb_0.4.0       
[19] vctrs_0.6.5       glue_1.6.2        stringi_1.7.8     compiler_4.2.2    pillar_1.9.0      generics_0.1.3   
[25] scales_1.2.1      profmem_0.6.0     pkgconfig_2.0.3

tidyverse / dplyr

perfromance slowdown using across within mutate #6985