Closed tzoltak closed 4 years ago
I looked a little closer on "find_data.R" and specifically find_data.default
and I've understood why it was working for glm but not for svyglm. It's about omitting a part that handles na.action in find_data.svyglm
. And also handling subset should be there as well (however without searching at parent frame as a second step). So I think find_data.svyglm
should look like this:
find_data.svyglm <- function(model, ...) {
data <- model[["data"]]
# handle subset
if (!is.null(model[["call"]][["subset"]])) {
subs <- try(eval(model[["call"]][["subset"]], data), silent = TRUE)
if (inherits(subs, "try-error")) {
subs <- TRUE
warning("'find_data()' cannot locate variable(s) used in 'subset'")
}
data <- data[subs, , drop = FALSE]
}
# handle na.action
if (!is.null(model[["na.action"]])) {
data <- data[-model[["na.action"]], , drop = FALSE]
}
data
}
If there are missings in variables used in a model, it is estimated on a subset of provided data that is restricted only to observations without missings. However
find_data()
S3 method for svyglm models returns a whole dataset with no such a subsetting applied. This causes troubles inmargins()
- see. design in margins.svyglmSolution could be changing line 109 in "find_data.R" to:
as
model.frame()
returns exactly the data on which model was estimated (and works well with svyglm models). By the way it may also help to reduce memory usage as, in contrast tomodel[["data"]]
, it returns only columns that are used in a model.(However making
margins()
work in such a situation requires also resolving design in margins.svyglm itself.)