quentingronau / bridgesampling

R package for bridge sampling
31 stars 16 forks source link

bridgesampling on cmdstanr #27

Open bnicenboim opened 3 years ago

bnicenboim commented 3 years ago

Hi, I'm trying to use bridgesampling with the output of cmdstanr (https://mc-stan.org/cmdstanr/index.html). Would you consider supporting it? In the meanwhile I tried to convert the cmdstanr object to a stanfit:

stanfit <- rstan::read_stan_csv(fit$output_files())

But I get:

bridge_sampler(stanfit)
Error in .local(object, ...) : 
  the model object is not created or not valid
singmann commented 3 years ago

At the moment it looks as if we cannot support cmdstanr easily. The key for bridgesampling support is a function that evaluates a model's log posterior for a set of samples. For rstan this is available through the log_prob function, but no such function seems to exist for cmdstanr.

However, you might be able to do the following:

Can you let us know if this works?

bnicenboim commented 3 years ago

Thanks for the fast answer! (By the way, stan() didn't let me to compile with zero iterations, was there another command?) The issue with recompiling models is that rstan and cmdstanr are usually in different versions, and things that don't work in rstan (yet) work with cmdstanr, given said that, this toy example works:

library(cmdstanr)
library(bridgesampling)
file <- file.path(cmdstan_path(), "examples", "bernoulli", "bernoulli.stan")
mod <- cmdstan_model(file)
data_list <- list(N = 10, y = c(0,1,0,0,0,0,0,0,0,1))

fit <- mod$sample(
  data = data_list,
  seed = 123,
  chains = 4,
  parallel_chains = 4,
)

stanfit <- rstan::read_stan_csv(fit$output_files())
new_model <- rstan::stan(file, iter =1, chains = 1, data = data_list)
bridge_sampler(samples = stanfit , stanfit_model = new_model)

This real example, doesn't:

m_eval <- cmdstan_model("./eval.stan")
f_eval_bayes <- m_eval$sample(data=data_bayes, parallel_chains = 4)
bayes_stanfit <- rstan::read_stan_csv(f_eval_bayes$output_files())
new_model_b <- rstan::stan("./eval.stan", iter =1, chains = 1, data = data_bayes)
bridge_sampler(samples = stanfit , stanfit_model = new_model_b)
# Error in object@.MISC$stan_fit_instance$unconstrain_pars(pars) : 
#  Exception: Variable P_pp missing  (in 'model24131573b97c5_eval' at line 16)

data_model.zip

singmann commented 3 years ago

I guess if the cmdstanr model uses some Stan feature that is not yet supported in rstan, there is not much that can be done at this point in time. Without the ability to evaluate the model with new parameters (i.e., the log_prob function) bridge sampling cannot be done.

And if iter = 0 does not work, does chains = 0 maybe work?

singmann commented 3 years ago

For another example see: https://discourse.mc-stan.org/t/how-to-do-bridge-sampling-and-calculate-bayes-factor-with-brms-with-cmdstanr-backend/18873/10

bnicenboim commented 3 years ago

yeah, this is similar to my example 1, but it didn't work for example 2

crsh commented 3 years ago

Hi there, I just stumbled across this issue and I think Paul Bürkner has recently implemented a fix for this in brms. If I understand correctly, the idea here is to recompile the model using rstan and adding it to the fit object (see update_misc_env() and add_rstan_model()). Could something like this be done in bridgesampling as well?

singmann commented 3 years ago

Hmm, the add_rstan_model() function in brms looks straight forward enough so this could in principle be added. However, I wonder whether the cmdstanr package or rstan would not be more suitable for a function that does nothing else than converting a cmdstanr object to an rstan object. Because this is what would be needed, convert an cmdstanr object to a rstan object and then pass the new rstan object to bridgesampling (i.e., it is not really a bridgesampling issue).

Furthermore, I was under the impression that the main reason for using cmdstanr is to use new features that are not yet implemented in rstan. So I would expect this recompiling should fail when these new features are used. In other words, if you could compile and fit with rstan, why not do that in the first place and avoid the problem?

crsh commented 3 years ago

I use it because, at least in my case, it seems to compile and sample faster and according to the documentation it is more stable, has a smaller memory overhead, and fewer dependencies.

singmann commented 3 years ago

I see. However, given that rstan compilation will still be needed with this solution on top of cmdstanr compilation this seems to me to be a low priority addition. What this means that given my time constraints I do not see myself adding this to bridgesampling soon, but am happy to take a look at a pull request.

maxbiostat commented 1 year ago

Would the (somewhat) recent updates to cmdstanr make it easier for bridgesampling to work with it?

bnicenboim commented 1 year ago

Hi, I see a branch called cmdstanr, I was wondering if there are news on this.

franfrutos commented 2 months ago

I have seen that many advances have been made in integrating bridges_sampling with CmdStan! Is there an available (experimental) implementation of this?

crsh commented 2 months ago

Hi Francisco, I think this implementation should work: https://github.com/quentingronau/bridgesampling/pull/38

Let me know if you experience any problems.

dbraun31 commented 1 month ago

I'm very interested in bridgesampling and obtaining Bayes Factors for models fit with cmdstanr. I tried your implementation quickly but couldn't get it to work. I wonder if I did it incorrectly? Running on Ubuntu 23.04:

# Install from dev branch
devtools::install_github('crsh/bridgesampling@cmdstanr-crsh')
library(cmdstanr)

m_alt <- cmdstanr::cmdstan_model('ttest_alt.stan')

alt_fit <- m_alt$sample(
    data=d,
    chains = 3,
    parallel_chains = 3,
    iter_warmup = 500,
    iter_sampling = 1000
)

# Log marginal likelihood
lml <- bridge_sampler(alt_fit)

Resulted in:

Compiling additional model methods...
SUNDIALS needs to be compiled with -fPIC when exposing functions or model methods on Linux.
Updating your make/local file to include -fPIC and rebuilding CmdStan now...
ar: creating stan/lib/stan_math/lib/sundials_6.1.1/lib/libsundials_nvecserial.a

ar: creating stan/lib/stan_math/lib/sundials_6.1.1/lib/libsundials_cvodes.a

ar: creating stan/lib/stan_math/lib/sundials_6.1.1/lib/libsundials_idas.a

ar: creating stan/lib/stan_math/lib/sundials_6.1.1/lib/libsundials_kinsol.a

/home/dave/.cmdstan/cmdstan-2.35.0/stan/lib/stan_math/lib/tbb_2020.3/build/Makefile.tbb:28: CONFIG: cfg=release arch=intel64 compiler=gcc target=linux runtime=cc12.3.0_libc2.37_kernel6.2.0

In file included from ../tbb_2020.3/src/tbb/concurrent_hash_map.cpp:17:
../tbb_2020.3/include/tbb/concurrent_hash_map.h:347:23: warning: ‘template<class _Category, class _Tp, class _Distance, class _Pointer, class _Reference> struct std::iterator’ is deprecated [-Wdeprecated-declarations]
  347 |         : public std::iterator<std::forward_iterator_tag,Value>
      |                       ^~~~~~~~
In file included from /usr/include/c++/12/bits/stl_construct.h:61,
                 from /usr/include/c++/12/memory:64,
                 from ../tbb_2020.3/include/tbb/tbb_stddef.h:452,
                 from ../tbb_2020.3/include/tbb/concurrent_hash_map.h:23:
/usr/include/c++/12/bits/stl_iterator_base_types.h:127:34: note: declared here
  127 |     struct _GLIBCXX17_DEPRECATED iterator
      |                                  ^~~~~~~~

cc1plus: note: unrecognized command-line option ‘-Wno-unknown-warning-option’ may have been intended to silence earlier diagnostics

In file included from ../tbb_2020.3/src/tbb/concurrent_queue.cpp:22:
../tbb_2020.3/include/tbb/internal/_concurrent_queue_impl.h:749:21: warning: ‘template<class _Category, class _Tp, class _Distance, class _Pointer, class _Reference> struct std::iterator’ is deprecated [-Wdeprecated-declarations]
  749 |         public std::iterator<std::forward_iterator_tag,Value> {
      |                     ^~~~~~~~
In file included from /usr/include/c++/12/bits/stl_construct.h:61,
                 from /usr/include/c++/12/memory:64,
                 from ../tbb_2020.3/include/tbb/tbb_stddef.h:452,
                 from ../tbb_2020.3/src/tbb/concurrent_queue.cpp:17:
/usr/include/c++/12/bits/stl_iterator_base_types.h:127:34: note: declared here
  127 |     struct _GLIBCXX17_DEPRECATED iterator
      |                                  ^~~~~~~~

../tbb_2020.3/include/tbb/internal/_concurrent_queue_impl.h:1013:21: warning: ‘template<class _Category, class _Tp, class _Distance, class _Pointer, class _Reference> struct std::iterator’ is deprecated [-Wdeprecated-declarations]
 1013 |         public std::iterator<std::forward_iterator_tag,Value> {
      |                     ^~~~~~~~
/usr/include/c++/12/bits/stl_iterator_base_types.h:127:34: note: declared here
  127 |     struct _GLIBCXX17_DEPRECATED iterator
      |                                  ^~~~~~~~

cc1plus: note: unrecognized command-line option ‘-Wno-unknown-warning-option’ may have been intended to silence earlier diagnostics

CmdStan has been rebuilt, continuing with model compilation...
Error: Error in JSON parsing 
at offset 0: 
The document is empty.
franfrutos commented 1 month ago

Hi!

Thanks to @crsh for the implementation and @dbraun31 for the code! I tested similar code on Windows and had the same problem, which I think is unrelated to the bridgesampling implementation in #38. The problem is related to enabling additional methods needed for bridgedsampling, you need to have the Rcpp package installed and the C++ toolchain set up properly. This is the code I used as a reproducible example:

# Install from dev branch
devtools::install_github('crsh/bridgesampling@cmdstanr-crsh', force = T)

# If neeede, install Rcpp
if(!require("Rcpp")) install.packages("Rcpp")

# load packages
library(cmdstanr)
library(bridgesampling)

file <- file.path(cmdstan_path(), "examples", "bernoulli", "bernoulli.stan")
mod <- cmdstan_model(file, force_recompile = T)

d <- list(N = 10, y = c(0,1,0,0,0,0,0,0,0,1))

fit <- mod$sample(
  data = d,
  seed = 123,
  chains = 4,
  parallel_chains = 4,
  refresh = 500
)

# Log marginal likelihood
lml <- bridge_sampler(fit)

And the original output:

Compiling additional model methods...
Error in inDL(x, as.logical(local), as.logical(now), ...) : 
  unable to load shared object 'C:/Users/User/AppData/Local/Temp/RtmpmeQFhy/sourceCpp-x86_64-w64-mingw32-1.0.13/sourcecpp_43a443123350/sourceCpp_2.dll':

After following the cmdstan installation guide and updating the PATH variable, everything went fine.

Output:

> lml <- bridge_sampler(fit)
Compiling additional model methods...
Iteration: 1
Iteration: 2
Iteration: 3
Iteration: 4
Iteration: 5
Warning message:
effective sample size cannot be calculated, has been replaced by number of samples. 
> print(lml)
Bridge sampling estimate of the log marginal likelihood: -6.20491
Estimate obtained in 5 iteration(s) via method "normal".