Open jgabry opened 1 year ago
lhs_var <- as.character(self$formula()[[2]])
This happens in the plot method and some other places too
I will figure out how to detect if a model is binomial vs Bernoulli and also how to process the formula for a binomial model to get the correct name of the LHS variable.
@lauken13 will go through and figure out where we're currently assuming bernoulli (binary outcome) where we shouldn't be
I will figure out how to detect if a model is binomial vs Bernoulli
Here are some helper functions to detect whether a model is binomial (not Bernoulli model) and which package was used to fit it.
For brms this is easy because it has a separate bernoulli
family, whereas rstanarm and lme4 use binomial
for both Bernoulli and binomial models. So to detect if a model was fit with brms and is a binomial (not Bernoulli) model we just need:
#' @param fit Fitted model object (not SurveyFit object)
is_brms_binomial <- function(fit) {
class(fit)[1] == "brmsfit" && family(fit)$family == "binomial"
}
For rstanarm and lme4 we need to additionally differentiate between binomial and Bernoulli, e.g. by checking if the outcome was expressed using cbind, e.g. cbind(successes, failures)
. So something like this should work:
is_rstanarm_binomial <- function(fit) {
class(fit)[1] == "stanreg" &&
family(fit)$family == "binomial" &&
grepl("^cbind\\(", as.character(formula(fit))[[2]])
}
is_lme4_binomial <- function(fit) {
class(fit)[1] == "glmerMod" &&
family(fit)$family == "binomial" &&
grepl("^cbind\\(", as.character(formula(fit))[[2]])
}
One issue is that technically binomial model can be specified without cbind by using the weights
argument. I think we can just not support models fit using that syntax because it's annoying to handle (we'd need to add an error message though).
to get the correct name of the LHS variable
still need to figure this out but will work on this now that the first part (binomial vs Bernoulli) is figured out
I think I figured out how to get the info we need from binomial models. Will share in our meeting later
I think I figured out how to get the info we need from binomial models. Will share in our meeting later
I figured it out for brms, rstanarm, lme4, but still need to think through what to do with custom modeling functions that might use different formula syntax. That could be an issue. Might have to force using familiar syntax.
@lauken13 All the functions I wrote for this are in the misc.R file. I think we just need to check the summary and plot methods for how they use Bernoulli vs binomial
@lauken13 will update summary function
@jgabry will add detection of custom modeling function and lhs variable name in that case
See handle-binomial branch
Something we should talk about at the next meeting:
I think we've only been testing on Bernoulli models, but binomial model syntax like
y_count|trials(n)
in brms orcbind(y_count, trials - y_count)
in rstanarm and lme4 isn't compatible with how we're using the formula in some methods, e.g.:https://github.com/lauken13/mrpkit/blob/b57eef492aa9b39529384aaec1d4d5bd8e2ec21b/R/SurveyFit.R#L181
This won't result in
"y_count"
unless the formula is simple likey_count ~ ...
.Binomial models will also likely break other code we have. For example, this code seems to assume a binary outcome: https://github.com/lauken13/mrpkit/blob/b57eef492aa9b39529384aaec1d4d5bd8e2ec21b/R/SurveyFit.R#L194-L196
There are other places too.