Closed stufield closed 5 years ago
I'm getting a much less dramatic difference with dev (0.8.0
):
suppressMessages(library(dplyr))
library(purrr)
library(bench)
library(tibble)
#> Warning: package 'tibble' was built under R version 3.5.2
set.seed(101)
# Create tibble with `n` columns, each with 100 Gaussian mean = 100
# Name p1 -> p_n
n <- 2500
df <- rerun(n, rnorm(100, mean = 100)) %>%
as_tibble(.name_repair = "minimal") %>%
set_names(paste0("p", 1:ncol(.)))
subset <- paste0("p", sample(1:n, n/2)) # random half of columns
# a function to pass to `mutate_at()`
# ratio to entry[1]
ratio <- function(x) x / x[1L]
# Use the bench pkg to compare
# base `apply()` to `mutate_at()`
bnch <- mark(
base_apply = { df[, subset] <- apply(df[, subset], 2, ratio); df },
dplyr_mutate_at = { mutate_at(df, subset, ratio) }
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
# Absolute differences
bnch
#> # A tibble: 2 x 10
#> expression min mean median max `itr/sec` mem_alloc n_gc n_itr
#> <chr> <bch:t> <bch:t> <bch:t> <bch> <dbl> <bch:byt> <dbl> <int>
#> 1 base_apply 51.4ms 57.4ms 56.7ms 67ms 17.4 35.8MB 13 9
#> 2 dplyr_mut… 173ms 201ms 176.9ms 253ms 4.97 49.9MB 6 3
#> # … with 1 more variable: total_time <bch:tm>
# Relative differences
summary(bnch, relative = TRUE)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 x 10
#> expression min mean median max `itr/sec` mem_alloc n_gc n_itr
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 base_apply 1 1 1 1 3.50 1 2.17 3
#> 2 dplyr_mut… 3.37 3.50 3.12 3.78 1 1.39 1 1
#> # … with 1 more variable: total_time <dbl>
We might look at it again later to understand where the time is being spent, but for now it looks "good" enough".
Yay! Thanks @romainfrancois ... I'm sorry, I should have specified that I was using dplyr 0.7.8
. Waiting with bated breath for dplyr 0.8.0
!!!! Thank you.
This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/
The performance effects of tidyeval on
mutate_at()
seem known, see #2813, but from this past issue I understood improvements inrlang
to have fixed (mitigated?) the issue. However from thereprex
below, the issue persists and is more of an issue than I thought (compared toapply()
).I often work with big data containing 5000 - 10000 variables and typically want to transform a subset of them by column name. I would prefer to use members of the
tidyverse
, but in this use case, is there a better alternative to baseR
?Created on 2019-02-04 by the reprex package (v0.2.1)
Session info
``` r devtools::session_info() #> ─ Session info ────────────────────────────────────────────────────────── #> setting value #> version R version 3.5.2 (2018-12-20) #> os macOS Mojave 10.14.3 #> system x86_64, darwin15.6.0 #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz America/Denver #> date 2019-02-04 #> #> ─ Packages ────────────────────────────────────────────────────────────── #> package * version date lib source #> assertthat 0.2.0 2017-04-11 [1] CRAN (R 3.5.0) #> backports 1.1.3 2018-12-14 [1] CRAN (R 3.5.0) #> bench * 1.0.1.9000 2019-02-01 [1] Github (r-lib/bench@3e5d63f) #> bindr 0.1.1 2018-03-13 [1] CRAN (R 3.5.0) #> bindrcpp * 0.2.2 2018-03-29 [1] CRAN (R 3.5.0) #> callr 3.1.1 2018-12-21 [1] CRAN (R 3.5.0) #> cli 1.0.1 2018-09-25 [1] CRAN (R 3.5.0) #> crayon 1.3.4 2017-09-16 [1] CRAN (R 3.5.0) #> desc 1.2.0 2018-05-01 [1] CRAN (R 3.5.0) #> devtools 2.0.1 2018-10-26 [1] CRAN (R 3.5.1) #> digest 0.6.18 2018-10-10 [1] CRAN (R 3.5.0) #> dplyr * 0.7.8 2018-11-10 [1] CRAN (R 3.5.0) #> evaluate 0.12 2018-10-09 [1] CRAN (R 3.5.0) #> fansi 0.4.0 2018-10-05 [1] CRAN (R 3.5.0) #> fs 1.2.6 2018-08-23 [1] CRAN (R 3.5.0) #> glue 1.3.0 2018-07-17 [1] CRAN (R 3.5.0) #> highr 0.7 2018-06-09 [1] CRAN (R 3.5.0) #> htmltools 0.3.6 2017-04-28 [1] CRAN (R 3.5.0) #> knitr 1.21 2018-12-10 [1] CRAN (R 3.5.1) #> magrittr 1.5 2014-11-22 [1] CRAN (R 3.5.0) #> memoise 1.1.0 2017-04-21 [1] CRAN (R 3.5.0) #> pillar 1.3.1 2018-12-15 [1] CRAN (R 3.5.0) #> pkgbuild 1.0.2 2018-10-16 [1] CRAN (R 3.5.0) #> pkgconfig 2.0.2 2018-08-16 [1] CRAN (R 3.5.0) #> pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.5.0) #> prettyunits 1.0.2 2015-07-13 [1] CRAN (R 3.5.0) #> processx 3.2.1 2018-12-05 [1] CRAN (R 3.5.0) #> profmem 0.5.0 2018-01-30 [1] CRAN (R 3.5.0) #> ps 1.3.0 2018-12-21 [1] CRAN (R 3.5.0) #> purrr * 0.2.5 2018-05-29 [1] CRAN (R 3.5.0) #> R6 2.3.0 2018-10-04 [1] CRAN (R 3.5.0) #> Rcpp 1.0.0 2018-11-07 [1] CRAN (R 3.5.0) #> remotes 2.0.2 2018-10-30 [1] CRAN (R 3.5.0) #> rlang 0.3.1 2019-01-08 [1] CRAN (R 3.5.2) #> rmarkdown 1.11 2018-12-08 [1] CRAN (R 3.5.0) #> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.5.0) #> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.5.0) #> stringi 1.2.4 2018-07-20 [1] CRAN (R 3.5.0) #> stringr 1.3.1 2018-05-10 [1] CRAN (R 3.5.0) #> testthat 2.0.1 2018-10-13 [1] CRAN (R 3.5.0) #> tibble * 2.0.1 2019-01-12 [1] CRAN (R 3.5.2) #> tidyselect 0.2.5 2018-10-11 [1] CRAN (R 3.5.0) #> usethis 1.4.0 2018-08-14 [1] CRAN (R 3.5.0) #> utf8 1.1.4 2018-05-24 [1] CRAN (R 3.5.0) #> withr 2.1.2 2018-03-15 [1] CRAN (R 3.5.0) #> xfun 0.4 2018-10-23 [1] CRAN (R 3.5.0) #> yaml 2.2.0 2018-07-25 [1] CRAN (R 3.5.0) #> #> [1] /Users/sfield/r_libs #> [2] /Library/Frameworks/R.framework/Versions/3.5/Resources/library ```