Open jameshay218 opened 2 years ago
Maybe helpful to see the C++ function:
// [[Rcpp::export]]
NumericVector likelihood_fast(const NumericVector &expected, const NumericVector &obs, const NumericVector &pars){
int total_cts = expected.size();
NumericVector ret(total_cts);
const double sd = pars["error"];
const double max_ct = 40.0;
const double log_const = log(0.5);
const double den = sd*M_SQRT2;
for(int i = 0; i < total_cts; ++i){
if(obs[i] < max_ct && obs[i] >= 0.0){
ret[i] = -log(sd*sqrt(2*M_PI)) - 0.5*pow((obs[i] - expected[i])/sd, 2);
} else if(obs[i] >= max_ct) {
ret[i] = log_const + log(erfc((max_ct - expected[i])/den));
} else {
ret[i] = log_const + log(1.0 + erf((0.0 - expected[i])/den));
}
}
return(ret);
}
Actually small update, I tried just adding a dummy Rcpp function call in the likelihood and this also breaks it:
// [[Rcpp::export]]
double ct_likelihood_fast_tmp(double x){
return(x);
}
Hi @jameshay218,
Interesting!...
Can I check first that, when specifying the cpp likelihood it is with the same file format defined by the function: drjacoby::cpp_template()
? You'll note that the template has an associated SEXP create_xptr()
function that is vital to get everything to play nicely.
Sorry I realize I was unclear from my first message -- this is just called from within the main likelihood function, which is in R eg., as if I were substituting the call to dnorm
with james_custom_dnorm_cpp
. So the majority of the overall likelihood function is in R. Here is a minimal example based on the vignette:
library(drjacoby)
library(Rcpp)
cppFunction("NumericVector dnorm_cpp(const NumericVector &expected, const double &obs, const NumericVector &pars){
int n_ret = expected.size();
NumericVector ret(n_ret);
const double sd = pars[0];
const double max_ct = 40.0;
const double log_const = log(0.5);
const double den = sd*M_SQRT2;
for(int i = 0; i < n_ret; ++i){
ret[i] = -log(sd*sqrt(2*M_PI)) - 0.5*pow((obs - expected[i])/sd, 2);
}
return(ret);
}")
set.seed(1)
# define true parameter values
mu_true <- 3
sigma_true <- 2
# draw example data
data_list <- list(x = rnorm(10, mean = mu_true, sd = sigma_true))
# define parameters dataframe
df_params <- define_params(name = "mu", min = -10, max = 10,
name = "sigma", min = 0, max = Inf)
# define log-likelihood function
r_loglike <- function(params, data, misc) {
# extract parameter values
mu <- params["mu"]
sigma <- params["sigma"]
# calculate log-probability of data
ret <- sum(dnorm(data$x, mean = mu, sd = sigma, log = TRUE))
# return
return(ret)
}
# define log-likelihood function
r_loglike_cpp <- function(params, data, misc) {
# extract parameter values
mu <- params["mu"]
sigma <- params["sigma"]
# calculate log-probability of data
ret <- sum(dnorm_cpp(data$x, mu, sigma))
# return
return(ret)
}
# define log-prior function
r_logprior <- function(params, misc) {
# extract parameter values
mu <- params["mu"]
sigma <- params["sigma"]
# calculate log-prior
ret <- dunif(mu, min = -10, max = 10, log = TRUE) +
dlnorm(sigma, meanlog = 0, sdlog = 1.0, log = TRUE)
# return
return(ret)
}
## Run example
mcmc <- run_mcmc(data = data_list,
df_params = df_params,
loglike = r_loglike,
logprior = r_logprior,
burnin = 1e3,
samples = 1e3,
pb_markdown = TRUE)
plot_par(mcmc, show="mu")
## Run example with custom Cpp dnorm
mcmc_cpp <- run_mcmc(data = data_list,
df_params = df_params,
loglike = r_loglike_cpp,
logprior = r_logprior,
burnin = 1e3,
samples = 1e3,
pb_markdown = TRUE)
plot_par(mcmc_cpp, show="mu")
Interestingly, I also get an issue when I use extraDistr::dhnorm
in the prior function.
Thanks James that is helpful.
It's not something we've tried I don't think, but is obviously not happy! I'll make an issue to look into this, at very least it would be nice if it threw up a warning.
All our testing has been with an "all in R" or "all in c++" approach. Using the drjacoby::cpp_template()
you can split out functions to call from the overall likelihood (so here you could define a loglike_cpp()
which called dnorm_cpp()
) but they'd both need to be in the same c++ source file.
As you say, interesting that extraDistr::dhnorm
works. I expect if you wrapped your dnorm_cpp()
into a proper package and loaded that it might work too, but that seems a bit of a faff.
Thanks Pete.
It's also interesting that if I print out the return value of dnorm_cpp
in r_loglike_cpp
, it shows the correct values. So the function is running correctly, suggesting that calling an Rcpp function is interfering with the drjacoby C++ environment (maybe some memory issues, or something odd with jumping from C++->R->C++->R->C++ each iteration). Seems like a totally reasonable limitation if there's no easy fix -- all in R or all in C++. I was hoping to hybridize because it is an ODE model, and I wasn't aware of any quick ways to put the whole thing in C++ without learning a whole new pipeline.
Yes, was just looking at the same thing. I'll have a chat with Bob to check there isn't a quick fix.
Still working on this. Posting a very simple reprex here for reference:
library(drjacoby)
# Source an empty cpp function
Rcpp::cppFunction("void break_mcmc(){}", verbose=TRUE)
# define log-likelihood function
ll <- function(params, data, misc) {
# Option to call the empty cpp function
if(misc$broken){
break_mcmc()
}
# calculate log-probability of data
ret <- sum(dnorm(data$x, mean = params["mu"], sd = 2, log = TRUE))
# return
return(ret)
}
# define log-prior function
lp <- function(params, misc) {
return(0)
}
# define true parameter values
mu_true <- 3
sigma_true <- 2
# draw example data
set.seed(1234)
data_list <- list(x = rnorm(10, mean = mu_true, sd = sigma_true))
# define parameters dataframe
df_params <- define_params(name = "mu", min = -10, max = 10, init = 3)
# Working
set.seed(1234)
mcmc_cpp <- run_mcmc(data = data_list,
df_params = df_params,
loglike = ll,
logprior = lp,
burnin = 500,
samples = 500,
chains = 1,
silent = TRUE,
misc = list(broken = FALSE))
# Returns MCMC chain for mu that looks fine:
plot(mcmc_cpp$output$mu, t = "l")
# Not working
set.seed(1234)
mcmc_cpp2 <- run_mcmc(data = data_list,
df_params = df_params,
loglike = ll,
logprior = lp,
burnin = 500,
samples = 500,
chains = 1,
silent = TRUE,
misc = list(broken = TRUE))
# Returns MCMC chain for mu that does not look fine (either monotonically increases or decreases):
plot(mcmc_cpp2$output$mu, t = "l")
Ok. Seems to be a RNG issue leading to issues when proposing a new theta value after calling the hybrid log_likelihood function. Similar issue here. Working on testing a fix on this branch
With further reading, an almost identical bug found for a MCMC applciation detailed here
The branch mentioned above seems to now work for your simple reprex @jameshay218. Would it be possible for you to check using your full, more complex, failing case at some point?
remotes::install_github("mrc-ide/drjacoby@bug/rng")
Really great package and documentation! I've finally got around to seeing how drjacoby will do with a hierarchical ODE model (my sampler is struggling).
Hopefully a quick question, but will provide a minimal example if not...
I am running a pared down version of my full model, and drjacoby seems to be working nicely. However, I have two identical custom likelihood functions: one written in R and one in Cpp with an Rcpp interface. See below: if I just swap from
likelihood
tolikelihood_fast
, the sampler tries to run off into weird parts of parameter space. Works totally fine withlikelihood
.Is there something about the way the C++ code is organized that prohibits Rcpp function use like this? Maybe I'm accidentally accessing/overwriting some variables in the C++ environment that I shouldn't be?
The reason I think it's some conflict in C++ is that the breakdown occurs even if I call the
likelihood_fast
function but don't use its output. eg., sampling from the prior by having the likelihood function return 0 at the end.