Problems with binomial models

jgabry commented 1 year ago

Something we should talk about at the next meeting:

I think we've only been testing on Bernoulli models, but binomial model syntax like y_count|trials(n) in brms or cbind(y_count, trials - y_count) in rstanarm and lme4 isn't compatible with how we're using the formula in some methods, e.g.:

https://github.com/lauken13/mrpkit/blob/b57eef492aa9b39529384aaec1d4d5bd8e2ec21b/R/SurveyFit.R#L181

This won't result in "y_count" unless the formula is simple like y_count ~ ....

Binomial models will also likely break other code we have. For example, this code seems to assume a binary outcome: https://github.com/lauken13/mrpkit/blob/b57eef492aa9b39529384aaec1d4d5bd8e2ec21b/R/SurveyFit.R#L194-L196

There are other places too.

jgabry commented 1 year ago

lhs_var <- as.character(self$formula()[[2]])

This happens in the plot method and some other places too

jgabry commented 1 year ago

I will figure out how to detect if a model is binomial vs Bernoulli and also how to process the formula for a binomial model to get the correct name of the LHS variable.

@lauken13 will go through and figure out where we're currently assuming bernoulli (binary outcome) where we shouldn't be

jgabry commented 1 year ago

I will figure out how to detect if a model is binomial vs Bernoulli

Here are some helper functions to detect whether a model is binomial (not Bernoulli model) and which package was used to fit it.

For brms this is easy because it has a separate bernoulli family, whereas rstanarm and lme4 use binomial for both Bernoulli and binomial models. So to detect if a model was fit with brms and is a binomial (not Bernoulli) model we just need:

#' @param fit Fitted model object (not SurveyFit object)
is_brms_binomial <- function(fit) {
  class(fit)[1] == "brmsfit" && family(fit)$family == "binomial"
}

For rstanarm and lme4 we need to additionally differentiate between binomial and Bernoulli, e.g. by checking if the outcome was expressed using cbind, e.g. cbind(successes, failures). So something like this should work:

is_rstanarm_binomial <- function(fit) {
  class(fit)[1] == "stanreg" && 
  family(fit)$family == "binomial" && 
  grepl("^cbind\\(", as.character(formula(fit))[[2]])
}

is_lme4_binomial <- function(fit) {
  class(fit)[1] == "glmerMod" && 
  family(fit)$family == "binomial" && 
  grepl("^cbind\\(", as.character(formula(fit))[[2]])
}

One issue is that technically binomial model can be specified without cbind by using the weights argument. I think we can just not support models fit using that syntax because it's annoying to handle (we'd need to add an error message though).

jgabry commented 1 year ago

to get the correct name of the LHS variable

still need to figure this out but will work on this now that the first part (binomial vs Bernoulli) is figured out

jgabry commented 1 year ago

I think I figured out how to get the info we need from binomial models. Will share in our meeting later

jgabry commented 1 year ago

I think I figured out how to get the info we need from binomial models. Will share in our meeting later

I figured it out for brms, rstanarm, lme4, but still need to think through what to do with custom modeling functions that might use different formula syntax. That could be an issue. Might have to force using familiar syntax.

jgabry commented 1 year ago

@lauken13 All the functions I wrote for this are in the misc.R file. I think we just need to check the summary and plot methods for how they use Bernoulli vs binomial

jgabry commented 9 months ago

@lauken13 will update summary function

@jgabry will add detection of custom modeling function and lhs variable name in that case

jgabry commented 2 days ago

See handle-binomial branch

lauken13 / mrpkit

Problems with binomial models #156