stan-dev / rstanarm

rstanarm R package for Bayesian applied regression modeling
https://mc-stan.org/rstanarm
GNU General Public License v3.0
388 stars 132 forks source link

Built with LTO #238

Open bgoodri opened 6 years ago

bgoodri commented 6 years ago

Summary:

After / during the feature/2.17 is merged, we need to figure out how to build rstanarm with LTO whenever possible.

Description:

Adding -flto=8 to the ~/.R/Makevars reduces the compile time to about 30 seconds per Stan program (in parallel) and about 90 seconds to link.

R Version:

3.4.x

Operating System:

Debian

bgoodri commented 6 years ago

@aadler Is it possible to build an R package like rstanarm on Windows with LTO? I am not expecting it to execute any faster but hoping the build process takes fewer resources on win-builder.

aadler commented 6 years ago

@bgoodri Hi. LTO as it is implemented in GCC 4.9.3 which underlies the current version of Rtools is either incomplete or incorrect. When I tried to build even base R with LTO, it interfered with various packages like dplyr so I had to install some from source and some from binary. Even when installing from source, I had to change the Makevars and install some packages with LTO and some without, which then requires ffat-lto-objects which increases file size.

The one time I tried to build R with a custom-built version of GCC 7.1.0, if I recall correctly, I used LTO and it worked better, but I don't remember clearly.

Therefore, after months of trying, I stopped recommending using LTO for building on Windows, and would continue to do so until Rtools is moved to GCC 7 at least. For that, @jeroen (aka @ropensci) is the new keeper of Rtools. All bribes should be sent his way ;)

jeroen commented 6 years ago

I considering if we should enable LTO for the new toolchain. Can you give me an example that shows a case for how this would be useful to support?

bgoodri commented 6 years ago

@jeroen It would be great if LTO worked with the Windows C++ toolchain.

Currently, installing rstanarm on CRAN under r-release-windows-ix86+x86_64 takes 2135 seconds and the shared object consumes 17.3Mb of disk space.

On Linux with g++-7, installation time with LTO is 75% of the time without LTO and disk space consumed with LTO is also 75% of the disk space without LTO.

Additionally, although it is not a problem for CRAN, installing rstanarm in parallel with 8 cores on Linux consumes almost 16 GB of RAM. With LTO, I can get the peak RAM spike down to 13.5 GB.

Someone else has found ( http://discourse.mc-stan.org/t/thinlto-standard-benchmarks/3673?u=bgoodri ) that the execution time for a bunch of Stan models is around 5% less when compiled with LTO (under clang).

aadler commented 6 years ago

@jeroen I'm not sure what you mean by enable. The toolchain, at least as of 3.4, has LTO enabled, but its implementation in GCC 4.9.3 was incomplete and often not worth it, at least in my many experiments. When the toolchain for windows will be based on GCC 7+, then it should certainly be continued to be built with LTO enabled.

bgoodri commented 6 years ago

@jeroen @aadler Is there a known trick to compiling a DLL using LTO with R-testing for Windows? In my ~/.R/Makevars, I have

CC = C:\rtools40\mingw64\bin\gcc -m$(WIN) CXX = C:\rtools40\mingw64\bin\g++ -m$(WIN) CXX11 = C:\rtools40\mingw64\bin\g++ -m$(WIN) CXX14 = C:\rtools40\mingw64\bin\g++ -m$(WIN) CXX14 += -flto=jobserver LOCAL_CPPFLAGS = -Og -Wno-unused-variable -Wno-unused-function -Wno-unused-local-typedefs LOCAL_CPPFLAGS += -Wno-ignored-attributes -Wno-deprecated-declarations -Wno-attributes -march=native -mtune=native AR = C:\rtools40\mingw64\bin\gcc-ar NM = C:\rtools40\mingw64\bin\gcc-nm RANLIB = C:\rtools40\mingw64\bin\gcc-ranlib endif

But when I try to compile a Stan program (e.g. with example(stan_model, package = "rstan")) using CXX14, I get several messages of the form

make[1]: [C:\Users\Stan\AppData\Local\Temp\ccYamjfZ.mk:15: C:\Users\Stan\AppData\Local\Temp\ccnmQXX2.ltrans4.ltrans.o] Error 1 (ignored)

although the DLL does compile and load. However, when I try to execute it, R crashes with a corrupted backtrace (under gdb and removing line breaks from the output)

(gdb) run Starting program: C:\PROGRA~1\R\R-TEST~1\bin\x64\Rterm.exe [New Thread 3832.0x1f58] warning: Invalid parameter passed to C runtime function. [New Thread 3832.0x1a38] [New Thread 3832.0x1b80] [New Thread 3832.0x7a0] R version 3.6.0 Under development (Testing Rtools) (2018-08-14 r75146) -- "Blame Jeroen" Copyright (C) 2018 The R Foundation for Statistical Computing Platform: x86_64-w64-mingw32/x64 (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. library(rstan) Loading required package: ggplot2 Registered S3 methods overwritten by 'ggplot2': method from [.quosures rlang c.quosures rlang print.quosures rlang Registered S3 method overwritten by 'dplyr': method from as.data.frame.tbl_df tibble Loading required package: StanHeaders rstan (Version 2.18.1, GitRev: 2e1f913d3ca3) For execution on a local, multicore CPU with excess RAM we recommend calling options(mc.cores = parallel::detectCores()). To avoid recompilation of unchanged Stan programs, we recommend calling rstan_options(auto_write = TRUE) stancode <- 'data {real y_mean;} parameters {real y;} model {y ~ normal(y_me$ mod <- stan_model(model_code = stancode) [New Thread 3832.0xa5c] [Thread 3832.0x1b80 exited with code 0] [Thread 3832.0x1a38 exited with code 0] [Thread 3832.0xa5c exited with code 0] make[1]: [C:\Users\Stan\AppData\Local\Temp\ccwsyYkm.mk:15: C:\Users\Stan\AppData \Local\Temp\ccNiw63a.ltrans4.ltrans.o] Error 1 (ignored) make[1]: [C:\Users\Stan\AppData\Local\Temp\ccwsyYkm.mk:3: C:\Users\Stan\AppData\ Local\Temp\ccNiw63a.ltrans0.ltrans.o] Error 1 (ignored) make[1]: [C:\Users\Stan\AppData\Local\Temp\ccwsyYkm.mk:24: C:\Users\Stan\AppData \Local\Temp\ccNiw63a.ltrans7.ltrans.o] Error 1 (ignored) make[1]: [C:\Users\Stan\AppData\Local\Temp\ccwsyYkm.mk:6: C:\Users\Stan\AppData\ Local\Temp\ccNiw63a.ltrans1.ltrans.o] Error 1 (ignored) make[1]: [C:\Users\Stan\AppData\Local\Temp\ccwsyYkm.mk:21: C:\Users\Stan\AppData \Local\Temp\ccNiw63a.ltrans6.ltrans.o] Error 1 (ignored) make[1]: [C:\Users\Stan\AppData\Local\Temp\ccwsyYkm.mk:27: C:\Users\Stan\AppData \Local\Temp\ccNiw63a.ltrans8.ltrans.o] Error 1 (ignored) make[1]: [C:\Users\Stan\AppData\Local\Temp\ccwsyYkm.mk:18: C:\Users\Stan\AppData \Local\Temp\ccNiw63a.ltrans5.ltrans.o] Error 1 (ignored) make[1]: [C:\Users\Stan\AppData\Local\Temp\ccwsyYkm.mk:12: C:\Users\Stan\AppData \Local\Temp\ccNiw63a.ltrans3.ltrans.o] Error 1 (ignored) make[1]: [C:\Users\Stan\AppData\Local\Temp\ccwsyYkm.mk:30: C:\Users\Stan\AppData \Local\Temp\ccNiw63a.ltrans9.ltrans.o] Error 1 (ignored) make[1]: [C:\Users\Stan\AppData\Local\Temp\ccwsyYkm.mk:9: C:\Users\Stan\AppData\ Local\Temp\ccNiw63a.ltrans2.ltrans.o] Error 1 (ignored) make[1]: [C:\Users\Stan\AppData\Local\Temp\ccwsyYkm.mk:33: C:\Users\Stan\AppData \Local\Temp\ccNiw63a.ltrans10.ltrans.o] Error 1 (ignored) [New Thread 3832.0x1cfc] [Thread 3832.0x1cfc exited with code 0] post <- sampling(mod, data = list(y_mean = 0), chains = 1, cores = 1) SAMPLING FOR MODEL '73fc79f8b1915e8208c736914c86d1a1' NOW (CHAIN 1). Program received signal SIGSEGV, Segmentation fault. 0xffffffff9b3c0000 in ?? () (gdb) bt

0 0xffffffff9b3c0000 in ?? ()

Backtrace stopped: previous frame identical to this frame (corrupt stack?) (gdb)

Do either of you have any idea what I am doing wrong?

aadler commented 5 years ago

No, I gave up a while ago. I should try again one of these days with @jeroen latest and greatest Rtools4. Sorry.