jmsigner / amt

37 stars 13 forks source link

`redistribution_kernel()` error with categorical variable #94

Closed bsmity13 closed 5 months ago

bsmity13 commented 1 year ago

Credit to @picardis for uncovering this bug with an actual use case.

It seems that amt:::ssf_weights() is returning a model.matrix with an (Intercept) column. This doesn't seem to be a problem (because as.matrix(xyz[, names(coefs)]) later drops the intercept), except where there is a factor variable in the model. In that case, coxph() fits a parameter for all levels of the factor, whereas the default model.matrix() behavior assigns the first level to the intercept.

A reprex:

library(amt)
library(dplyr)

# Get landuse covariate
lu <- get_amt_fisher_covars()$landuse

# Format data for iSSA
dat <- amt_fisher %>% 
  filter(name == "Leroy") %>% 
  steps() %>% 
  random_steps() %>% 
  extract_covariates(lu, where = "both") %>% 
  mutate(landuse_end = factor(landuse_end))

# Fit iSSF
m <- fit_issf(dat, case_ ~ landuse_end + strata(step_id_))

# Create redistribution kernel (error occurs here)
rk <- redistribution_kernel(x = m,
                            start = make_start(dat[1, ]),
                            map = lu)

It seems like this can be fixed by modifying amt:::ssf_formula() to include - 1 in the model formula, but I'm not sure if there could be unintended consequences there. Seems like it should be fine, given the coxph() model will never have an intercept? My suggestion:

ssf_formula <- function (formula) 
{
  rhs <- strsplit(as.character(formula)[3], "\\+")[[1]]
  rhs <- rhs[-grep("strata", rhs)]
  stats::as.formula(paste("~", paste(rhs, collapse = "+"), "- 1")) # change is here
}

Also, a gentle reminder to any future readers that dummy coding your own factor variables in an iSSA is generally safer.

jmsigner commented 5 months ago

I created a permanent link to this here: https://github.com/jmsigner/amt/wiki/Simulations