Closed dfalster closed 2 years ago
Here's output from Daniels iMac (27-inch Retina Late 2014, 4 GHz Quad-Core Intel Core i7)
# A tibble: 6 × 14
expression strategy min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time result memory time gc
<bch:expr> <chr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm> <list> <list> <list> <list>
1 scm FF16 1.02s 1.02s 0.977 17.06MB 0.977 1 1 1.02s <NULL> <Rprofmem [690 × 3]> <bench_tm [1]> <tibble>
2 build_schedule FF16 2.88s 2.88s 0.347 8.04MB 0.347 1 1 2.88s <NULL> <Rprofmem [4,830 × 3]> <bench_tm [1]> <tibble>
3 scm FF16w 984.23ms 984.23ms 1.02 17.07MB 1.02 1 1 984.23ms <NULL> <Rprofmem [692 × 3]> <bench_tm [1]> <tibble>
4 build_schedule FF16w 2.91s 2.91s 0.343 7.97MB 0.343 1 1 2.91s <NULL> <Rprofmem [4,648 × 3]> <bench_tm [1]> <tibble>
5 scm K93 723.63ms 723.63ms 1.38 52.48KB 0 1 0 723.63ms <NULL> <Rprofmem [119 × 3]> <bench_tm [1]> <tibble>
6 build_schedule K93 8.69s 8.69s 0.115 16.66MB 0.115 1 1 8.69s <NULL> <Rprofmem [8,453 × 3]> <bench_tm [1]> <tibble>
Daniel's 2018 MacBook Pro (2.7 GHz Quad-Core Intel Core i7)
# A tibble: 6 × 14
expression strategy min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time result memory
<bch:expr> <chr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm> <list> <list>
1 scm FF16 1.04s 1.04s 0.961 17.06MB 0.961 1 1 1.04s <NULL> <Rprofme…
2 build_schedule FF16 2.82s 2.82s 0.355 8.04MB 0 1 0 2.82s <NULL> <Rprofme…
3 scm FF16w 1.06s 1.06s 0.944 17.07MB 0 1 0 1.06s <NULL> <Rprofme…
4 build_schedule FF16w 2.8s 2.8s 0.357 7.97MB 0 1 0 2.8s <NULL> <Rprofme…
5 scm K93 866.16ms 866.16ms 1.15 52.48KB 0 1 0 866.16ms <NULL> <Rprofme…
6 build_schedule K93 9.81s 9.81s 0.102 16.66MB 0.102 1 1 9.81s <NULL> <Rprofme…
The results above agree with published benchmarks for these machines, which have them as iMac (1051) vs MBP (1003), so pretty similar.
Old lab iMac (27-inch, Late 2013, 3.5 GHz Quad-Core Intel Core i7)
# A tibble: 6 × 14
expression strategy min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time result memory time gc
<bch:expr> <chr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm> <list> <list> <list> <list>
1 scm FF16 1.15s 1.15s 0.871 17.36MB 0.871 1 1 1.15s <NULL> <Rprofmem> <bench_tm> <tibble>
2 build_schedule FF16 3.46s 3.46s 0.289 8.64MB 0 1 0 3.46s <NULL> <Rprofmem> <bench_tm> <tibble>
3 scm FF16w 1.16s 1.16s 0.864 17.07MB 0.864 1 1 1.16s <NULL> <Rprofmem> <bench_tm> <tibble>
4 build_schedule FF16w 3.38s 3.38s 0.296 7.97MB 0 1 0 3.38s <NULL> <Rprofmem> <bench_tm> <tibble>
5 scm K93 819.11ms 819.11ms 1.22 52.48KB 0 1 0 819.11ms <NULL> <Rprofmem> <bench_tm> <tibble>
6 build_schedule K93 9.88s 9.88s 0.101 16.66MB 0.101 1 1 9.88s <NULL> <Rprofmem> <bench_tm> <tibble>
Isaac's 2021 Dell Latitude 5420 11th Gen Intel(R) Core(TM) i5-1145G7 @ 2.60GHz 1.50 GHz
# A tibble: 6 × 14
expression strategy min median `itr/sec` mem_al…¹ gc/se…² n_itr n_gc total…³ result memory
<bch:expr> <chr> <bch:tm> <bch:tm> <dbl> <bch:by> <dbl> <int> <dbl> <bch:t> <list> <list>
1 scm FF16 2.34s 2.34s 0.427 17.83MB 0.427 1 1 2.34s <NULL> <Rprofmem>
2 build_schedule FF16 7.25s 7.25s 0.138 8.64MB 0 1 0 7.25s <NULL> <Rprofmem>
3 scm FF16w 2.44s 2.44s 0.411 17.07MB 0.411 1 1 2.44s <NULL> <Rprofmem>
4 build_schedule FF16w 7.49s 7.49s 0.134 7.97MB 0 1 0 7.49s <NULL> <Rprofmem>
5 scm K93 1.58s 1.58s 0.634 52.48KB 0.634 1 1 1.58s <NULL> <Rprofmem>
6 build_schedule K93 19.53s 19.53s 0.0512 16.66MB 0 1 0 19.53s <NULL> <Rprofmem>
# … with 2 more variables: time <list>, gc <list>, and abbreviated variable names ¹mem_alloc, ²`gc/sec`,
# ³total_time
Phil’s home PC (AMD Ryzen Threadripper 1950X 16-Core Processor 3.40 GHz). Excuse the screenshot, Windows is struggling with the paste function.
@.***
Hi @pzylstra - you seem to have uploaded screenshot of results from my machine.
2020 AMD Ryzen 5 3600 (Debian)- seems pretty poor actually, especially compared to the 2013 iMac. Maybe RStudio runs differently on linux compared to MacOS...
# A tibble: 6 × 14
expression strategy min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time result memory time gc
<bch:expr> <chr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm> <list> <list> <list> <list>
1 scm FF16 2.51s 2.51s 0.398 17.06MB 0 1 0 2.51s <NULL> <Rprofmem [692 × 3]> <bench_tm [1]> <tibble [1 × 3]>
2 build_schedule FF16 8.11s 8.11s 0.123 7.96MB 0 1 0 8.11s <NULL> <Rprofmem [4,646 × 3]> <bench_tm [1]> <tibble [1 × 3]>
3 scm FF16w 2.56s 2.56s 0.391 17.07MB 0 1 0 2.56s <NULL> <Rprofmem [692 × 3]> <bench_tm [1]> <tibble [1 × 3]>
4 build_schedule FF16w 8.26s 8.26s 0.121 7.97MB 0.121 1 1 8.26s <NULL> <Rprofmem [4,648 × 3]> <bench_tm [1]> <tibble [1 × 3]>
5 scm K93 2.3s 2.3s 0.434 52.48KB 0 1 0 2.3s <NULL> <Rprofmem [119 × 3]> <bench_tm [1]> <tibble [1 × 3]>
6 build_schedule K93 27.52s 27.52s 0.0363 16.66MB 0 1 0 27.52s <NULL> <Rprofmem [8,453 × 3]> <bench_tm [1]> <tibble [1 × 3]>
When I run the benchmarks through my OS terminal directly (without RStudio), I get significantly better results:
# A tibble: 6 × 14
expression strategy min median `itr/sec` mem_alloc `gc/sec` n_itr
<bch:expr> <chr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int>
1 scm FF16 635.85ms 635.85ms 1.57 20.37MB 0 1
2 build_schedule FF16 1.99s 1.99s 0.503 8.19MB 0 1
3 scm FF16w 642.06ms 642.06ms 1.56 17.73MB 0 1
4 build_schedule FF16w 2.01s 2.01s 0.497 7.98MB 0 1
5 scm K93 473.33ms 473.33ms 2.11 1.13MB 0 1
6 build_schedule K93 5.75s 5.75s 0.174 16.68MB 0.174 1
I have a feeling RStudio is running the console weirdly on linux, maybe through some sort of emulation layer.
No, I replied to yours so it had your figures, it seems that github dropped the screenshot from mine though. These are the results from my 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz 3.00 GHz
expression strategy min median itr/sec
mem_alloc gc/sec
n_itr n_gc total_time result memory time gc
MacBook Pro (14-inch, 2021, M1 Pro, 8 cores)
# A tibble: 6 × 14
expression strategy min median itr/se…¹ mem_a…² gc/se…³ n_itr n_gc
<bch:expr> <chr> <bch:tm> <bch:tm> <dbl> <bch:b> <dbl> <int> <dbl>
1 scm FF16 721.93ms 721.93ms 1.39 20.34MB 0 1 0
2 build_schedule FF16 2.23s 2.23s 0.447 8.17MB 0 1 0
3 scm FF16w 747.41ms 747.41ms 1.34 18.26MB 0 1 0
4 build_schedule FF16w 2.29s 2.29s 0.437 7.99MB 0 1 0
5 scm K93 538.95ms 538.95ms 1.86 1.13MB 0 1 0
6 build_schedule K93 6.28s 6.28s 0.159 16.68MB 0 1 0
AMD Ryzen 7 3700X (2019) - Ubuntu 22.04 (Jammy)
Performing similarly to @devmitch on Linux + Ryzen but with no difference between Rstudio and the terminal.
My compiler is gcc 11.2.0 - @dfalster are your Macs running clang by default?
# A tibble: 6 × 14
expression strategy min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time result memory time gc
<bch:expr> <chr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm> <list> <list> <list> <list>
1 scm FF16 2.64s 2.64s 0.379 17.83MB 0.758 1 2 2.64s <NULL> <Rprofmem [922 × 3]> <bench_tm [1]> <tibble [1 × 3]>
2 build_schedule FF16 8.15s 8.15s 0.123 8.64MB 0.123 1 1 8.15s <NULL> <Rprofmem [6,418 × 3]> <bench_tm [1]> <tibble [1 × 3]>
3 scm FF16w 2.65s 2.65s 0.378 17.07MB 0 1 0 2.65s <NULL> <Rprofmem [692 × 3]> <bench_tm [1]> <tibble [1 × 3]>
4 build_schedule FF16w 8.31s 8.31s 0.120 7.97MB 0 1 0 8.31s <NULL> <Rprofmem [4,648 × 3]> <bench_tm [1]> <tibble [1 × 3]>
5 scm K93 2.17s 2.17s 0.461 52.48KB 0 1 0 2.17s <NULL> <Rprofmem [119 × 3]> <bench_tm [1]> <tibble [1 × 3]>
6 build_schedule K93 26s 26s 0.0385 16.66MB 0 1 0 26s <NULL> <Rprofmem [8,458 × 3]> <bench_tm [1]> <tibble [1 × 3]>
R -e "devtools::load_all(); run_plant_benchmarks"
# A tibble: 6 × 14
expression strategy min median itr/se…¹ mem_a…² gc/se…³ n_itr n_gc
<bch:expr> <chr> <bch:tm> <bch:tm> <dbl> <bch:b> <dbl> <int> <dbl>
1 scm FF16 2.62s 2.62s 0.382 17.85MB 0.382 1 1
2 build_schedule FF16 8.07s 8.07s 0.124 8.71MB 0 1 0
3 scm FF16w 2.6s 2.6s 0.385 17.07MB 0.385 1 1
4 build_schedule FF16w 8.26s 8.26s 0.121 7.97MB 0 1 0
5 scm K93 2.17s 2.17s 0.462 52.48KB 0 1 0
6 build_schedule K93 25.99s 25.99s 0.0385 16.66MB 0 1 0
On Becca's MacBook Air (M1, 2020)
# A tibble: 6 × 14
expression strategy min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time result memory time gc
<bch:expr> <chr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm> <list> <list> <list> <list>
1 scm FF16 1.14s 1.14s 0.874 17.06MB 0 1 0 1.14s <NULL> <Rprofmem [690 × 3]> <bench_tm [1]> <tibble [1 × 3]>
2 build_schedule FF16 3.75s 3.75s 0.267 7.96MB 0 1 0 3.75s <NULL> <Rprofmem [4,646 × 3]> <bench_tm [1]> <tibble [1 × 3]>
3 scm FF16w 1.16s 1.16s 0.860 17.07MB 0 1 0 1.16s <NULL> <Rprofmem [855 × 3]> <bench_tm [1]> <tibble [1 × 3]>
4 build_schedule FF16w 3.8s 3.8s 0.263 7.97MB 0 1 0 3.8s <NULL> <Rprofmem [5,360 × 3]> <bench_tm [1]> <tibble [1 × 3]>
5 scm K93 712.91ms 712.91ms 1.40 52.48KB 0 1 0 712.91ms <NULL> <Rprofmem [119 × 3]> <bench_tm [1]> <tibble [1 × 3]>
6 build_schedule K93 7.61s 7.61s 0.131 16.66MB 0 1 0 7.61s <NULL> <Rprofmem [8,453 × 3]> <bench_tm [1]> <tibble [1 × 3]>
Falster lab PC (HP EliteDesk 800 G3 SFF, Intel(R) Core(TM) i5-6500 CPU Ubuntu 20.04.4 LTS). Seems like Linux machines are taking a really long time when running through Rstudio like Mitch's example?
# A tibble: 6 × 14
expression strategy min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time result memory time gc
<bch:expr> <chr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm> <list> <list> <list> <list>
1 scm FF16 3.31s 3.31s 0.303 17.85MB 0.303 1 1 3.31s <NULL> <Rprofmem> <bench_tm [1]> <tibble>
2 build_schedule FF16 10.36s 10.36s 0.0966 8.64MB 0 1 0 10.36s <NULL> <Rprofmem> <bench_tm [1]> <tibble>
3 scm FF16w 3.29s 3.29s 0.304 17.07MB 0.304 1 1 3.29s <NULL> <Rprofmem> <bench_tm [1]> <tibble>
4 build_schedule FF16w 10.43s 10.43s 0.0959 8.04MB 0 1 0 10.43s <NULL> <Rprofmem> <bench_tm [1]> <tibble>
5 scm K93 2.76s 2.76s 0.363 52.48KB 0 1 0 2.76s <NULL> <Rprofmem> <bench_tm [1]> <tibble>
6 build_schedule K93 32.22s 32.22s 0.0310 16.66MB 0 1 0 32.22s <NULL> <Rprofmem> <bench_tm [1]> <tibble>
Here's the summary
scm build Who Machine
0.635 1.99 Mitch 2020 AMD Ryzen 5 3600 (Debian)
0.72 2.23 Kathleen 2021 14" MacBook Pro M1 Pro @ 3.2 GHz (8 cores)
1.02 2.88 Daniel 2014 27" imac Intel i7 @ 4 GHz Quad-Core
1.04 2.82 Daniel 2018 MacBook Pro Intel i7 @ 2.7 GHz Quad-Core
1.1 3.37 Phil 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz 3.00 GHz
1.14 3.75 Becca 2020 MacBook Air M1 @ 3.2 GHz (8 cores)
1.15 3.46 lab 2013 27" iMac Intel Core i7 @ 3.5 GHz Quad-Core
2.34 7.25 Isaac 2021 Dell Latitude 5420 Intel i5-1145G7 @ 2.60GHz
2.64 8.15 Andrew 2019 AMD Ryzen 7 3700X - Ubuntu 22.04 (Jammy)
3.31 10.36 lab 2018? HP EliteDesk 800 G3 SFF, Intel i5-6500 CPU Ubuntu
The boots 3 are more than twice as slow as the next one up. NB @aornugent @itowers1
Mitch's machine is shockingly fast. The new Apple M2 MacBook should come close to Mitch's machine.
Macbook Pro 13" M2 8C CPU/ 10C GPU/ 24GB RAM/ 512GB SSD
# A tibble: 6 × 14
expression strategy min median `itr/sec` mem_al…¹ gc/se…² n_itr n_gc total_…³ result memory time
<bch:expr> <chr> <bch:tm> <bch:tm> <dbl> <bch:by> <dbl> <int> <dbl> <bch:tm> <list> <list> <list>
1 scm FF16 673.18ms 673.18ms 1.49 20.49MB 0 1 0 673.18ms <NULL> <Rprofmem> <bench_tm>
2 build_schedule FF16 2.08s 2.08s 0.481 8.14MB 0.481 1 1 2.08s <NULL> <Rprofmem> <bench_tm>
3 scm FF16w 680.16ms 680.16ms 1.47 18.26MB 0 1 0 680.16ms <NULL> <Rprofmem> <bench_tm>
4 build_schedule FF16w 2.1s 2.1s 0.476 7.99MB 0 1 0 2.1s <NULL> <Rprofmem> <bench_tm>
5 scm K93 480.75ms 480.75ms 2.08 1.13MB 0 1 0 480.75ms <NULL> <Rprofmem> <bench_tm>
6 build_schedule K93 5.65s 5.65s 0.177 16.68MB 0 1 0 5.65s <NULL> <Rprofmem> <bench_tm>
Super fast!!!
I was able to speed up my installation of plant
by setting the compiler flags to:
CXXFLAGS=-O3
in ~/.R/Makevars
# A tibble: 6 × 14
expression strategy min median itr/se…¹ mem_a…² gc/se…³ n_itr n_gc
<bch:expr> <chr> <bch:tm> <bch:tm> <dbl> <bch:b> <dbl> <int> <dbl>
1 scm FF16 678.42ms 678.42ms 1.47 17.91MB 1.47 1 1
2 build_schedule FF16 1.94s 1.94s 0.514 8.66MB 0 1 0
3 scm FF16w 657.92ms 657.92ms 1.52 17.07MB 0 1 0
4 build_schedule FF16w 1.95s 1.95s 0.513 8.04MB 0 1 0
5 scm K93 457.18ms 457.18ms 2.19 52.48KB 0 1 0
6 build_schedule K93 5.49s 5.49s 0.182 16.66MB 0 1 0
Reference: https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
Ooh yeah!!
Macbook Pro 14" M2 Pro 12-C CPU 19-C GPU/16C NE/32GB/1TB
# A tibble: 6 × 14
expression strategy min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc
<bch:expr> <chr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl>
1 scm FF16 421.18ms 421.18ms 2.37 19.44MB 0 1 0
2 build_sche… FF16 1.07s 1.07s 0.938 8.24MB 0 1 0
3 scm FF16r 5.24s 5.24s 0.191 17.73MB 0 1 0
4 build_sche… FF16r 15.65s 15.65s 0.0639 7.35MB 0.0639 1 1
5 scm K93 261.86ms 261.86ms 3.82 651.72KB 0 1 0
6 build_sche… K93 1.15s 1.15s 0.869 12.58MB 0.869 1 1
New speed record!
I now understand a bit better what is happening to cause variation in speed using different installation methods on a single machine. @itowers1 @aornugent will be relevant for you
It comes down to method used to compile the cpp code and whether it is compiled with debug symbols.
Debug symbols help you diagnose calls in the stack and causes of errors, but leads to much slower runtime. You can see these options flagged in the compilation as as -g -O0
, e.g.
clang++ -arch arm64 -std=gnu++11 -I ... -fPIC -falign-functions=64 -Wall -g -O2 -UNDEBUG -Wall -pedantic -g -O0 -c RcppExports.cpp -o RcppExports.o
By default, pkgbuild::compile_dll
has argument debug=TRUE which leads causes debug symbols to be included and slow runtime.
So building via terminal (with make) or devtools::load_all()
will be slow.
If you want to optimise the code for speed, you need to compile without debug symbols. This is done by setting debug=FALSE
in pkgbuild::compile_dll
. This leads to a compiler call like
clang++ -arch arm64 -std=gnu++11 -DNDEBUG -I ... -fPIC -falign-functions=64 -Wall -g -O2 -Wall -pedantic -fdiagnostics-color=always -c ff16_strategy.cpp -o ff16_strategy.o
Also, installing the package leads to optimised compilation, so using R CMD INSTALL
or devtools::install()
will be fast.
So when developing, the following workflow gets the best of both worlds.
After changing some cpp code, run
pkgbuild::compile_dll(debug=FALSE, compile_attributes = FALSE)
devtools::load_all()
The first line recompiles with optimisation, and the second line loads the package with the new code. If the code is already compiled, devtools::load_all
won't recompile it.
If you skip the first line, your code will be recompiled with debug symbols, and will be slow.
Also, including compile_attributes = FALSE
in the first line avoids the need to recompile the Rcpp exports.
You'll want to do something different when
I'm going to update the make file too to use optimised code by default
> pkgbuild::compile_dll(debug=FALSE, compile_attributes = FALSE)
clang++ -arch arm64 -std=gnu++11 -dynamiclib -Wl,-headerpad_max_install_names -undefined dynamic_lookup -single_module -multiply_defined suppress -L/Library/Frameworks/R.framework/Resources/lib -L/opt/R/arm64/lib -o plant.so RcppExports.o RcppR6.o adaptive_interpolator.o cohort_schedule.o control.o disturbance.o ff16_cohort.o ff16_strategy.o ff16r_cohort.o ff16r_strategy.o gradient.o interpolator.o k93_cohort.o k93_strategy.o ode_control.o plant_tools.o qag.o qag_internals.o qk.o qk_rules.o scm_utils.o tk_spline.o uniroot.o util.o util_post_rcpp.o water_strategy.o -F/Library/Frameworks/R.framework/.. -framework R -Wl,-framework -Wl,CoreFoundation
installing to /private/var/folders/0x/nplts4jd5615dw_pr_np25540000gq/T/Rtmpkt12tt/devtools_install_75075f0e18d9/00LOCK-plant/00new/plant/libs
** checking absolute paths in shared objects and dynamic libraries
─ DONE (plant)
> devtools::load_all()
ℹ Loading plant
> run_plant_benchmarks(strategy_types = list(FF16 = FF16_Strategy))
Running benchmarks via `run_plant_benchmarks`
Running with:
strategy
1 FF16
# A tibble: 2 × 14
expression strategy min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc
<bch:expr> <chr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl>
1 scm FF16 430.64ms 430.64ms 2.32 17.09MB 0 1 0
2 build_sche… FF16 1.03s 1.03s 0.969 8.14MB 0 1 0
Fast!!!
Supposedly one can also set the optimisation level using a Makevars file, save at ~/.R/Makevars
. I tried this but it didn't work for me. But it did work for Andrew (see above). For some reason pkgbuild::compile_dll
is not using the Makevars file on my machine.
Hi @aornugent @Becca-90 @fjrrobinson @devmitch @itowers1 @pzylstra
We've been discussing speed lately and so i thought it would be good to compare speed of plant across our different machines. I actually setup a function
run_plant_benchmarks
for this very purpose, building off the bench package. So can you please post results in the issue below notingTo run, please checkout the develop branch then