klmr / box

Write reusable, composable and modular R code
https://klmr.me/box/
MIT License
869 stars 47 forks source link

Cannot include Rcpp code by following the vignette #295

Open mlell opened 2 years ago

mlell commented 2 years ago

Error description

The vignette on compiled code says:

... it is possible to integrate compiled code via R’s SHLIB mechanism for building shared libraries. In particular, this also works with packages such as Rcpp.

However, I fail in following the steps outlined there. It cannot find the Rcpp libraries. Did I miss something or should the vignette be extended with more steps? https://github.com/klmr/box/issues/13 mentions a very old demo file that GitHub shows with a warning that it does not belong to this repository, but I do not understand enought to see whether it is still relevant. Also, it seems to use private Rcpp calls, so I wonder whether there is a way using the Rcpp public API instead.

This is my code:

box/cpptest/cpp/hello.cpp

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
void hello_world(){
    Rcout << "Hello world";
    Rcout << std::endl;
}

box/cpptest/__setup__.R:

build_shared_lib = function () {
  # Change working directory so R finds the Makevars.
  old_dir = setwd(box::file())
  on.exit(setwd(old_dir))
  # Compile all files in the cpp directory
  f <- list.files("cpp", full.names = TRUE)
  exitcode = system2('R', c('CMD', 'SHLIB', f))
  stopifnot(exitcode == 0L)
}

build_shared_lib()

Call:

> box::use(box/cpptest/`__setup__`)
## g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG      -fpic  -g -O2 -ffile-prefix-map=/build/r-base-XqSJAD/r-base-4.0.4=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c  cpp/hello.cpp -o cpp/hello.o
## cpp/hello.cpp:1:10: fatal error: Rcpp.h: No such file or directory
##     1 | #include <Rcpp.h>
##       |          ^~~~~~~~
## compilation terminated.
## make: *** [/usr/lib/R/etc/Makeconf:181: cpp/hello.o] Error 1
## Error in box::use(box/cpptest/`__setup__`) : exitcode == 0L is not TRUE
## (inside “build_shared_lib()”)

R version

platform       x86_64-pc-linux-gnu         
arch           x86_64                      
os             linux-gnu                   
system         x86_64, linux-gnu           
status                                     
major          4                           
minor          0.4                         
year           2021                        
month          02                          
day            15                          
svn rev        80002                       
language       R                           
version.string R version 4.0.4 (2021-02-15)
nickname       Lost Library Book

‘box’ version

1.1.0

mlell commented 2 years ago

How about this:

Instead of all the code above, one could maybe use the Rcpp caching feature? This requires write access to the module directory or a cache (e.g. https://github.com/klmr/box/issues/264). I still do not know how to unload the linked library when the module is unloaded (cpp/hello.cpp is found in the original post).

box/cpptest/test.R:

# Keep Rcpp exports in a subenvironment
cpp <- new.env()

# Compile. Has a caching function
Rcpp::sourceCpp(
  file.path(box::file(), "cpp","hello.cpp"), # source file
  cacheDir = box::file("rcpp-cache"),      # permanent cache directory
  env = cpp)  # default is globalenv(), use a local environment instead

#' Hello World function
#' @export
hello_world <- cpp$hello_world

Rcpp decides on its own whether it needs to compile. Comparison of time after first vs. second import:

Restarting R session...

* Project '~/' loaded. [renv 0.15.4]
> system.time(box::use(box/cpptest/test))
   user  system elapsed 
  4.975   0.499   5.593 
> test$hello_world()
Hello world
> lm(list = ls())

Restarting R session...

* Project '~/' loaded. [renv 0.15.4]
> system.time(box::use(box/cpptest/test))
   user  system elapsed 
  0.170   0.002   0.241 
> test$hello_world()
Hello world
> # ........... modify  cpp/hello_world.cpp .............
> system.time(box::reload(test))
   user  system elapsed 
  4.904   0.477   5.422 
> test$hello_world()
Hello earth
mlell commented 2 years ago

... Rcpp::sourceCpp() returns the build directory, so a list of DLLs can be kept. However, loading of dependencies, like Rcpp.so, is not tracked:


# === Compile or load C++ code ====================

cpp <- new.env()      # R bindings to compiled functions go here
build_dirs <- list()  # Directories that contain DLLs

# Compile/load and save the build path for later unloading

build_dirs$hello_world <- Rcpp::sourceCpp(
  box::file("cpp","hello.cpp"),
  cacheDir = box::file("rcpp-cache"), 
  env = cpp)$buildDirectory

.on_unload <- function(nm){
  # Unlink libraries in all build directories of this module
  pat <- utils::glob2rx(paste0("*",.Platform$dynlib.ext))
  dlls <- list.files(unlist(build_dirs), full.names = TRUE, pattern = pat)
  for(d in dlls) dyn.unload(d) 
}

# === Exports ====================================
#' Hello World 
#' @export
hello_world <- cpp$hello_world
klmr commented 2 years ago

Hi,

Rcpp requires a fairly complicated additional setup when invoking the C++ compiler. Unfortunately Rcpp does not (or at least used to not) export the required functions1, so we need to mess around with its internals. I haven’t tried this in a long time so some or all of these internals might have changed. But have a look at the source code of the previous Rcpp usage vignette: https://github.com/klmr/modules/blob/develop/vignettes/rcpp/__install__.r.

This won’t work out of the box but it might be adaptable. In particular, pay attention to the following points:

I figured the above out by reverse engineering the internals of Rcpp::sourceCpp.


1 Unfortunately the official stance is that Rcpp compilation is only supported for packages, not via any other route (except via the wrapper sourceCpp), so none of the above is supported or documented by Rcpp. The easiest route might therefore be to actually generate a mock package directory structure at runtime, copy the source files into the package directory tree, “compile” that package and copy the resulting binary files and R code adapters back. But that might have its own issues, and I never tried it. Ideally ‘box’ would have official support for Rcpp but without cooperation of Rcpp I don’t think this is possible. I might change the vignette to remove mention of Rcpp.

mlell commented 2 years ago

Hi, thanks for taking a look onto this.

Did I get your point correctly that Rcpp supports only compilation via packages and sourceCpp and therefore, if box would support Rcpp directly, it would need to "copy" the internal behaviour of Rcpp which might change in the future?

In that case the approach I suggest above should be a solution, right? Because the only function of Rcpp it needs is indeed the supported way via sourceCpp? The documentation of sourceCpp(cacheDir= ...) even mentions:

Directory to use for caching shared libraries. [...] The default value of tempdir() results in the cache being valid only for the current R session. Pass an alternate directory to preserve the cache across R sessions. (Emphasis mine)

So persistent storage of compiled code seems to be explicitly supported by Rcpp

Also, if you figured out the mentioned internals from the source of sourceCpp, using that function should include these internals, or did I miss something?

klmr commented 2 years ago

For a single C++ source file without dependencies, sourceCpp is the way to go, yes. However, that no longer works for more complex projects with multiple source files and/or third-party dependencies.

mlell commented 2 years ago

I see, thanks for explaining.