paul-buerkner / brms

brms R package for Bayesian generalized multivariate non-linear multilevel models using Stan
https://paul-buerkner.github.io/brms/
GNU General Public License v2.0
1.27k stars 183 forks source link

Feature request: avoid Rstan model conversion when exposing Stan functions #1512

Closed wds15 closed 1 year ago

wds15 commented 1 year ago

In the current development version of cmdstanr one can expose functions to R directly. See here:

https://discourse.mc-stan.org/t/exposing-stan-user-defined-functions-using-cmdstanr-and-rcpp/27104/5?u=wds15

This functionality has recently been included in cmdstanr (as of April 2023) and is not yet included in a released version as far as I can see. Hence, once the next cmdstanr is released it would be great to avoid the need in brms to convert cmdstanr models to Rstan models and compile these.

paul-buerkner commented 1 year ago

Great! Will you let me know once this has been released?

wds15 commented 1 year ago

Maybe @andrjohns can ping here right away once out...but I can watch this as well.

wds15 commented 1 year ago

or @rok-cesnovar ?

rok-cesnovar commented 1 year ago

Bumping the release version now, it's long overdue anyways. Then rebuilding the binaries. Thanks for the bump.

paul-buerkner commented 1 year ago

Thank you both!

wds15 commented 1 year ago

@rok-cesnovar did you intend to release a new cmdstanr? It would be great to have it out soonish to get the new features with a released version.

paul-buerkner commented 1 year ago

I have now implemented this in brms by just checking whether expose_functions is available to the cmdstanmodel.

paul-buerkner commented 1 year ago

I have two question about the current cmdstanr implementation. Can we ensure that the function are also returned by expose_functions in cmdstanr (or is this already the case?). And how can we set the environment to which the functions are exposed?

wds15 commented 1 year ago

Here is the definition of the function:

https://github.com/stan-dev/cmdstanr/blob/dbf41cd389dc8e2902676bf205d34e40a0f51a05/R/utils.R#L872

This gets called in the R/model.R in a way so that I would think that the functions are being compiled and placed in a slot of the model object called "functions". This slot is an environment. So

stanmodel$functions

is an environment with the model functions...but there is also the globalenv argument which places things into the global thing.

paul-buerkner commented 1 year ago

makes sense. Will add this option

paul-buerkner commented 1 year ago

should now be implemented. both arguments vectorize and env should work now.

mages commented 1 year ago

This is a timely discussion! I have a model with an ODE, which I'd like to expose to the user, but I get the following note:

expose_functions(m, vectorize = TRUE)
Error in get(funs[i], pos = fun_env, mode = "function") : 
  object 'external' of mode 'function' was not found

While the function expose_cmdstanr_functions from @rok-cesnovar works:

source("https://raw.githubusercontent.com/rok-cesnovar/misc/master/expose-cmdstanr-functions/expose_cmdstanr_functions.R")
model_path <- cmdstanr::write_stan_file(stancode(m))
expose_cmdstanr_functions(model_path, expose_to_global_env = TRUE)
Vectorize(freefall)(1:10, 0.7419, 10, 2)
 [1] 0.4000157 0.4000004 0.4000004 0.4000004 0.4000005 0.4000006 0.4000003 0.4000007 0.4000008 0.4000005

Here is the toy example code:

library(brms)

mydata <- data.frame(
  ts = c(0.5, 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5, 
         5.5, 6, 6.5, 7, 7.5, 8, 8.5, 9, 9.5, 10), 
  y_obs = c(6.1, 15.5, 22.3, 26.9, 30, 36.7, 36.2, 
            42.7, 40.4, 44.7, 47.9, 41.6, 44.7,
            45.6, 47.7, 46.4, 42, 42.7, 50.5, 47.5))   

myFun <- "
  vector freefall_ode(real t,        // time
                  vector y,      // state
                  vector theta  // parameters
                  ) {
    vector[1] dydt;
    dydt[1] = 2.0 * theta[2] - theta[1] * y[1];
    return dydt;
  }
  real freefall(real t, real ys, real g, real gamma){

   vector[1] y0;
   y0[1] = ys;

   array[1] real ts;
   ts[1] = t;

   vector[2] theta;
   theta[1] = g;
   theta[2] = gamma;

   array[1] vector[1] out;

   out = ode_rk45(freefall_ode, y0, 0.0, ts, theta);

   return(out[1, 1]);
  }
"

m <- brm(bf(y_obs ~ freefall(ts, 0.7419, g, gamma), 
            gamma ~ 1, g ~ 1,
            nl = TRUE),
    prior = prior(gamma(2, 0.2), nlpar = "g", lb =0) + 
      prior(gamma(3, 2), nlpar = "gamma", lb =0) +
      prior(cauchy(0, 5), class = "sigma"),
    family = gaussian(),
    data = mydata,
    stanvars = stanvar(scode = myFun, block = "functions"),
    backend = "cmdstanr"
    )

expose_functions(m, vectorize = TRUE)

source("https://raw.githubusercontent.com/rok-cesnovar/misc/master/expose-cmdstanr-functions/expose_cmdstanr_functions.R")
model_path <- cmdstanr::write_stan_file(stancode(m))
expose_cmdstanr_functions(model_path, expose_to_global_env = TRUE)

Vectorize(freefall)(1:10, 0.7419, 10, 2)
sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.4.1

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] brms_2.20.0   Rcpp_1.0.11   zeallot_0.1.0 dplyr_1.1.2  

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.0     farver_2.1.1         loo_2.6.0            fastmap_1.1.1        tensorA_0.36.2       shinystan_2.6.0     
 [7] shinyjs_2.1.0        promises_1.2.0.1     digest_0.6.33        mime_0.12            lifecycle_1.0.3      StanHeaders_2.26.27 
[13] ellipsis_0.3.2       processx_3.8.2       magrittr_2.0.3       posterior_1.4.1      compiler_4.3.1       rlang_1.1.1         
[19] tools_4.3.1          igraph_1.5.0         utf8_1.2.3           yaml_2.3.7           data.table_1.14.8    knitr_1.43          
[25] prettyunits_1.1.1    bridgesampling_1.1-2 htmlwidgets_1.6.2    pkgbuild_1.4.2       plyr_1.8.8           cmdstanr_0.5.3      
[31] dygraphs_1.1.1.6     abind_1.4-5          miniUI_0.1.1.1       withr_2.5.0          grid_4.3.1           stats4_4.3.1        
[37] fansi_1.0.4          xts_0.13.1           xtable_1.8-4         colorspace_2.1-0     inline_0.3.19        ggplot2_3.4.2       
[43] scales_1.2.1         gtools_3.9.4         cli_3.6.1            mvtnorm_1.2-2        rmarkdown_2.23       crayon_1.5.2        
[49] generics_0.1.3       RcppParallel_5.1.7   rstudioapi_0.14      reshape2_1.4.4       rstan_2.21.8         stringr_1.5.0       
[55] shinythemes_1.2.0    bayesplot_1.10.0     parallel_4.3.1       matrixStats_1.0.0    base64enc_0.1-3      vctrs_0.6.3         
[61] Matrix_1.6-0         jsonlite_1.8.7       bookdown_0.34        callr_3.7.3          crosstalk_1.2.0      glue_1.6.2          
[67] codetools_0.2-19     ps_1.7.5             DT_0.28              distributional_0.3.2 stringi_1.7.12       gtable_0.3.3        
[73] later_1.3.1          munsell_0.5.0        tibble_3.2.1         colourpicker_1.2.0   pillar_1.9.0         htmltools_0.5.5     
[79] Brobdingnag_1.2-9    deSolve_1.36         RcppEigen_0.3.3.9.3  R6_2.5.1             evaluate_0.21        shiny_1.7.4.1       
[85] lattice_0.21-8       markdown_1.7         backports_1.4.1      threejs_0.3.3        httpuv_1.6.11        rstantools_2.3.1    
[91] coda_0.19-4          gridExtra_2.3        nlme_3.1-162         checkmate_2.2.0      xfun_0.39            zoo_1.8-12          
[97] pkgconfig_2.0.3 
paul-buerkner commented 1 year ago

Can you try again with the latest github version?

mages commented 1 year ago

This works now! Thank you Paul!

expose_functions(m, vectorize = TRUE)
freefall(1:10, 0.7419, 10, 2)
 [1] 0.4000157 0.4000004 0.4000004 0.4000004 0.4000005 0.4000006 0.4000003 0.4000007 0.4000008 0.4000005
wds15 commented 1 year ago

Cool that this is now tested!

mages commented 1 year ago

Unfortunately, for another example I run into a problem with expose_function again, while expose_cmdstanr_functions works. This time the error message is:

Error in Rcpp::sourceCpp(code = code, env = env, verbose = verbose) : 
  Error 1 occurred building shared library.

Here is my test case:

library(brms)
createSampleData <- function(seed=NULL){
  set.seed(seed) 
  t <- 1:20
  premium <- 1000
  ELR <- rlnorm(1, log(0.6), 0.1)
  theta <- rnorm(1, 0.2, 0.02)
  mu_cum <- ELR * (1 - exp(-t * theta))
  mu_incr <- mu_cum -  ELR * (1 - exp(-(t-1) * theta))
  sig <- abs(rstudent_t(1, 10, 0.1, 0.1))
  incr_lr <- rlnorm(length(t), log(mu_incr), sig)
  incr <- as.integer(incr_lr*premium)
  cum_lr <- cumsum(incr_lr)
  cum_lr_direct <-  rlnorm(length(t), log(mu_cum), sig)
  dat = data.frame(t, premium, incr, incr_lr, cum_lr, cum_lr_direct)
  return(dat)
}
dat <- createSampleData(seed=4)
dat$delta <- 1
dat$deltaf <- "paid"
dat2 <- dat
dat2$delta <- 0
dat2$deltaf <- "os"
dat <- rbind(dat, dat2)

myCompFun <- "
// ODE System
vector ode_claimsprocess(real t, vector y,  vector theta){

  vector[3] dydt;
  // Define ODEs
  dydt[1] = - theta[1] * y[1];
  dydt[2] = theta[1] * theta[3] * y[1] - theta[2] * y[2];
  dydt[3] = theta[2] * theta[4] * y[2];

  return dydt;
}

//Priors & Solver
real int_claimsprocess(real t, real ker, real kp, 
                       real RLR, real RRF, real delta){
  vector[3] y0;
  array[1] vector[3] y;
  vector[4] theta;
  theta[1] = ker; theta[2] = kp;
  theta[3] = RLR; theta[4] = RRF;
  // Set initial values
  y0[1] = 1; y0[2] = 0; y0[3] = 0;

  y = ode_rk45(ode_claimsprocess, y0, 0, rep_array(t, 1), theta);  

  return (y[1, 2] * (1 - delta) + y[1, 3] * delta);
}

//Application to OS and Incremental Paid Data
real claimsprocess(real t, real devfreq, real ker, real kp, 
                   real RLR, real RRF, real delta){
    real out; 
    out = int_claimsprocess(t, ker, kp, RLR, RRF, delta);
    if( (delta > 0) && (t > devfreq) ){ // paid greater dev period 1
    // incremental paid
     out = out - int_claimsprocess(t - devfreq, ker, kp, RLR, RRF, delta);
    }
    return(out);
}
"

frml <- bf(
  incr_lr ~ eta,
  nlf(eta ~ log(claimsprocess(t, 1.0, ker, kp, RLR, RRF, delta))),
  nlf(RLR ~ 0.6 * exp(oRLR * 0.1)), 
  nlf(RRF ~ 0.95 * exp(oRRF * 0.05)),
  nlf(ker ~ 1.7 * exp(oker * 0.02)), 
  nlf(kp ~ 0.5 * exp(okp * 0.05)),
  oRLR ~ 1, oRRF ~ 1, oker ~ 1, okp ~ 1, sigma ~ 0 + deltaf,
  nl = TRUE)

mypriors <- c(prior(normal(0, 1), nlpar = "oRLR"),
              prior(normal(0, 1), nlpar = "oRRF"),
              prior(normal(0, 1), nlpar = "oker"),
              prior(normal(0, 1), nlpar = "okp"),
              prior(normal(0.15, 0.025), class = "b", 
                    coef="deltafos", dpar= "sigma"),
              prior(normal(0.1, 0.02), class = "b", 
                    coef="deltafpaid", dpar = "sigma"))

prior_compartment_lognorm <- brm(frml, data = dat, 
                                 family = brmsfamily("lognormal", link_sigma = "log"),
                                 prior = mypriors, backend = "cmdstanr",
                                 stanvars = stanvar(scode = myCompFun, block = "functions"),
                                 sample_prior = "only")

expose_functions(prior_compartment_lognorm, vectorize = TRUE)
## Error in Rcpp::sourceCpp(code = code, env = env, verbose = verbose) : 
##  Error 1 occurred building shared library.

source("https://raw.githubusercontent.com/rok-cesnovar/misc/master/expose-cmdstanr-functions/expose_cmdstanr_functions.R")
model_path <- cmdstanr::write_stan_file(stancode(prior_compartment_lognorm))
expose_cmdstanr_functions(model_path, expose_to_global_env = TRUE)
Vectorize(claimsprocess)(1:10, 1, 1, 1, 1, 1, 1)
## [1] 0.2642412285 0.3297531086 0.2068575411 0.1075700645 0.0511504527 0.0230763059 0.0100561000 0.0042758214 0.0017850282 0.0007346893
paul-buerkner commented 1 year ago

Strange. your example works for me. perhaps restart R and/or rebuild cmdstan?

mages commented 1 year ago

It is strange. I have restarted R and rebuilt cmdstan but to no avail. However, using expose_cmdstanr_functions from @rok-cesnovar works.

Here are more details from the error message I am getting:

file4bce30c53081.cpp:346:126: error: expected ';' after top level declarator
  Eigen::Matrix<stan::promote_args_t<double, stan::base_type_t<double>,                 stan::base_type_t<double>>,-1,1> ode_ claimsprocess(const double& t, const double& devfreq, const double& ker,               const double& kp, const double& RLR, const double& RRF,               const double& delta) {   return x39model_9687b197518cd0a9c4e7b8af36d2c5b4_modelx39_namespace::claimsprocess(            t, devfreq, ker, kp, RLR, RRF, delta, &Rcpp::Rcout); }
                                                                                                                             ^
                                                                                                                             ;
file4bce30c53081.cpp:385:120: error: redefinition of 'ode_'
Eigen::Matrix<stan::promote_args_t<double, stan::base_type_t<double>,                 stan::base_type_t<double>>,-1,1> ode_ claimsprocess(const double& t, const double& devfreq, const double& ker, const double& kp, const double& RLR, const double& RRF, const double& delta);
                                                                                                                       ^
file4bce30c53081.cpp:346:122: note: previous definition is here
  Eigen::Matrix<stan::promote_args_t<double, stan::base_type_t<double>,                 stan::base_type_t<double>>,-1,1> ode_ claimsprocess(const double& t, const double& devfreq, const double& ker,               const double& kp, const double& RLR, const double& RRF,               const double& delta) {   return x39model_9687b197518cd0a9c4e7b8af36d2c5b4_modelx39_namespace::claimsprocess(            t, devfreq, ker, kp, RLR, RRF, delta, &Rcpp::Rcout); }
                                                                                                                         ^
file4bce30c53081.cpp:385:124: error: expected ';' after top level declarator
Eigen::Matrix<stan::promote_args_t<double, stan::base_type_t<double>,                 stan::base_type_t<double>>,-1,1> ode_ claimsprocess(const double& t, const double& devfreq, const double& ker, const double& kp, const double& RLR, const double& RRF, const double& delta);
                                                                                                                           ^
                                                                                                                           ;
file4bce30c53081.cpp:397:34: error: use of undeclared identifier 'claimsprocess'
    rcpp_result_gen = Rcpp::wrap(claimsprocess(t, devfreq, ker, kp, RLR, RRF, delta));
paul-buerkner commented 1 year ago

Hmm this looks more like a cmdstanr rather than brms question now, since brms merely calls the $expose_functions method of cmdstanr objects.

mages commented 1 year ago

I think you are correct, as I can replicate the issue with cmdstanr now:

library(cmdstanr)

stancode <- "
// generated with brms 2.20.0
functions {
  // ODE System
  vector odeclaimsprocess(real t, vector y, vector theta) {
    vector[3] dydt;
    // Define ODEs
    dydt[1] = -theta[1] * y[1];
    dydt[2] = theta[1] * theta[3] * y[1] - theta[2] * y[2];
    dydt[3] = theta[2] * theta[4] * y[2];

    return dydt;
  }

  //Priors & Solver
  real intclaimsprocess(real t, real ker, real kp, real RLR, real RRF,
                        real delta) {
    vector[3] y0;
    array[1] vector[3] y;
    vector[4] theta;
    theta[1] = ker;
    theta[2] = kp;
    theta[3] = RLR;
    theta[4] = RRF;
    // Set initial values
    y0[1] = 1;
    y0[2] = 0;
    y0[3] = 0;

    y = ode_rk45(odeclaimsprocess, y0, 0, rep_array(t, 1), theta);

    return y[1, 2] * (1 - delta) + y[1, 3] * delta;
  }

  //Application to OS and Incremental Paid Data
  real claimsprocess(real t, real devfreq, real ker, real kp, real RLR,
                     real RRF, real delta) {
    real out;
    out = intclaimsprocess(t, ker, kp, RLR, RRF, delta);
    if ((delta > 0) && (t > devfreq)) {
      // paid greater dev period 1
      // incremental paid
      out = out - intclaimsprocess(t - devfreq, ker, kp, RLR, RRF, delta);
    }
    return out;
  }
}
data {
  int<lower=1> N; // total number of observations
  vector[N] Y; // response variable
  int<lower=1> K_oRLR; // number of population-level effects
  matrix[N, K_oRLR] X_oRLR; // population-level design matrix
  int<lower=1> K_oRRF; // number of population-level effects
  matrix[N, K_oRRF] X_oRRF; // population-level design matrix
  int<lower=1> K_oker; // number of population-level effects
  matrix[N, K_oker] X_oker; // population-level design matrix
  int<lower=1> K_okp; // number of population-level effects
  matrix[N, K_okp] X_okp; // population-level design matrix
  // covariates for non-linear functions
  array[N] int C_eta_1;
  vector[N] C_eta_2;
  int<lower=1> K_sigma; // number of population-level effects
  matrix[N, K_sigma] X_sigma; // population-level design matrix
  int prior_only; // should the likelihood be ignored?
}
transformed data {

}
parameters {
  vector[K_oRLR] b_oRLR; // regression coefficients
  vector[K_oRRF] b_oRRF; // regression coefficients
  vector[K_oker] b_oker; // regression coefficients
  vector[K_okp] b_okp; // regression coefficients
  vector[K_sigma] b_sigma; // regression coefficients
}
transformed parameters {
  real lprior = 0; // prior contributions to the log posterior
  lprior += normal_lpdf(b_oRLR | 0, 1);
  lprior += normal_lpdf(b_oRRF | 0, 1);
  lprior += normal_lpdf(b_oker | 0, 1);
  lprior += normal_lpdf(b_okp | 0, 1);
  lprior += normal_lpdf(b_sigma[1] | 0.15, 0.025);
  lprior += normal_lpdf(b_sigma[2] | 0.1, 0.02);
}
model {
  // likelihood including constants
  if (!prior_only) {
    // initialize linear predictor term
    vector[N] nlp_oRLR = rep_vector(0.0, N);
    // initialize linear predictor term
    vector[N] nlp_oRRF = rep_vector(0.0, N);
    // initialize linear predictor term
    vector[N] nlp_oker = rep_vector(0.0, N);
    // initialize linear predictor term
    vector[N] nlp_okp = rep_vector(0.0, N);
    // initialize non-linear predictor term
    vector[N] nlp_RLR;
    // initialize non-linear predictor term
    vector[N] nlp_RRF;
    // initialize non-linear predictor term
    vector[N] nlp_ker;
    // initialize non-linear predictor term
    vector[N] nlp_kp;
    // initialize non-linear predictor term
    vector[N] nlp_eta;
    // initialize non-linear predictor term
    vector[N] mu;
    // initialize linear predictor term
    vector[N] sigma = rep_vector(0.0, N);
    nlp_oRLR += X_oRLR * b_oRLR;
    nlp_oRRF += X_oRRF * b_oRRF;
    nlp_oker += X_oker * b_oker;
    nlp_okp += X_okp * b_okp;
    sigma += X_sigma * b_sigma;
    for (n in 1 : N) {
      // compute non-linear predictor values
      nlp_RLR[n] = 0.6 * exp(nlp_oRLR[n] * 0.1);
    }
    for (n in 1 : N) {
      // compute non-linear predictor values
      nlp_RRF[n] = 0.95 * exp(nlp_oRRF[n] * 0.05);
    }
    for (n in 1 : N) {
      // compute non-linear predictor values
      nlp_ker[n] = 1.7 * exp(nlp_oker[n] * 0.02);
    }
    for (n in 1 : N) {
      // compute non-linear predictor values
      nlp_kp[n] = 0.5 * exp(nlp_okp[n] * 0.05);
    }
    for (n in 1 : N) {
      // compute non-linear predictor values
      nlp_eta[n] = log(claimsprocess(C_eta_1[n], 1, nlp_ker[n], nlp_kp[n],
                                     nlp_RLR[n], nlp_RRF[n], C_eta_2[n]));
    }
    for (n in 1 : N) {
      // compute non-linear predictor values
      mu[n] = nlp_eta[n];
    }
    sigma = exp(sigma);
    target += lognormal_lpdf(Y | mu, sigma);
  }
  // priors including constants
  target += lprior;
}
generated quantities {

}
"
fn <- write_stan_file(stancode)
mod <- cmdstan_model(fn)
mod$expose_functions(global = TRUE)
# Error in Rcpp::sourceCpp(code = code, env = env, verbose = verbose) : 
#  Error 1 occurred building shared library.
wds15 commented 1 year ago

Can u post this as cmdstanr issue please?

mages commented 1 year ago

Just adding that investigation has shown that the above behaviour was caused by legacy cmdstanr installations. Some were installed in ~/.cmdstan others in ~/.cmdstanr. Deleting all old installations and reinstalling CmdStanR resolved the issue.