This is more likely due to my incompetence rather than an actual issue with the library but I've noticed that no matter how restrictive priors I give to stan_glm (my use case is logistic regression), the resulting model doesn't differ from a regular frequentist logistic regression. If I use either core rstan or blmr, I am able to influence the resulting model using informative priors.
My use case is basically that I have data based on random sampling which is outdated and newer data which has heavy selection bias and I aim to use the data based on random sampling to create a prior for intercept that should optimally combat the selection bias (I also use propensity scores but that's another story)
Below is a code with toy data with one predictor variable. Essentially same model is built four times, one using the regular glm-function and one Bayesian version each with rstan, rstanarm and blmr.
int N; // number of trials
int N_test; // number of trials test data
int y [N]; // number of hits
vector [N] x; // predictive variable
vector[N_test] x_test;
}
parameters {
// The (unobserved) model parameters that we want to recover
real alpha;
real beta;
}
model {
// A logistic regression model relating the x to y
y ~ bernoulli_logit(alpha + beta * x);
// Prior models for the unobserved parameters
alpha ~ normal(log(0.07/(1-0.07)), 0.005);
beta ~ normal(0, 0.5);
As you can see, fit1 and fit2 (glm, rstanarm) correspond to each other, while fit3 and fit4 (rstan, blmr) are also a pair. In my mind, the rstanarm result should be similar to the rstan and blmr fits but something goes awry.
As said, there likely are errors in how I set up the model. Coming from social sciences I am not a particularly accomplished statistician even in frequentist tradition and I am pretty inexperienced in Bayesian methods.
I prefer rstanarm as an interface to Stan over others so would like to use it :)
This is more likely due to my incompetence rather than an actual issue with the library but I've noticed that no matter how restrictive priors I give to stan_glm (my use case is logistic regression), the resulting model doesn't differ from a regular frequentist logistic regression. If I use either core rstan or blmr, I am able to influence the resulting model using informative priors.
My use case is basically that I have data based on random sampling which is outdated and newer data which has heavy selection bias and I aim to use the data based on random sampling to create a prior for intercept that should optimally combat the selection bias (I also use propensity scores but that's another story)
Below is a code with toy data with one predictor variable. Essentially same model is built four times, one using the regular glm-function and one Bayesian version each with rstan, rstanarm and blmr.
`library(rstanarm) library(rstan) library(brms)
create prior data
random_y <- rbinom(3000, 1, prob=0.07)
define prior mean
mu <- prop.table(table(random_y))[2]
data for modeling
y <- rbinom(5000, 1, prob=0.35)
x <- rnorm(5000, mean=y, sd=2)
df <- data.frame(x, y)
new observations for prediction
newObs <- data.frame(rnorm(100, mean=0.5, sd=3))
names(newObs)[1] <- "x"
frequentist logistic regression
fit1 <- glm(y ~ x, data = df, family="binomial")
bayesian logistic regression using rstanarm
fit2 <- stan_glm(y ~ x, data=df, family="binomial", prior_intercept = student_t(location=log(mu/(1-mu)), scale=0.05), prior=student_t(location=0, scale=0.5))
with Stan
stanMod <- "data {
int N; // number of trials int N_test; // number of trials test data int y [N]; // number of hits
vector [N] x; // predictive variable
vector[N_test] x_test;
}
parameters {
// The (unobserved) model parameters that we want to recover real alpha; real beta;
}
model {
// A logistic regression model relating the x to y y ~ bernoulli_logit(alpha + beta * x);
// Prior models for the unobserved parameters alpha ~ normal(log(0.07/(1-0.07)), 0.005); beta ~ normal(0, 0.5);
}
generated quantities { vector[N_test] y_test; for(i in 1:N_test) { y_test[i] = alpha + beta*x_test[i]; } }"
StanDat <- list(N=nrow(df),x=df$x,y=df$y, N_test=nrow(newObs),x_test=newObs$x)
fit3 <- stan(model_code = stanMod, data=StanDat, chains=4)
with brms
brmPriors <- c(set_prior("normal(0,0.5)", class = "b", coef = "x"), set_prior("normal(log(0.07/(1-0.07)),0.05", class = "Intercept"))
fit4 <- brm(y~x, data=df, family = "bernoulli", prior=brmPriors)
results
coef(fit1) coef(fit2) mean(extractStan$alpha) mean(extractStan$beta) fixef(fit4)`
As you can see, fit1 and fit2 (glm, rstanarm) correspond to each other, while fit3 and fit4 (rstan, blmr) are also a pair. In my mind, the rstanarm result should be similar to the rstan and blmr fits but something goes awry.
As said, there likely are errors in how I set up the model. Coming from social sciences I am not a particularly accomplished statistician even in frequentist tradition and I am pretty inexperienced in Bayesian methods.
I prefer rstanarm as an interface to Stan over others so would like to use it :)
I use R 4.0.4 on Windows 10.