jacob-long / panelr

Regression models and utilities for repeated measures and panel data
Other
99 stars 21 forks source link

Including random slopes for factor variable is not working #54

Closed ahcombs closed 10 months ago

ahcombs commented 1 year ago

Hi there,

It doesn't appear to be possible to include random slopes for factor variables, which makes it pretty involved to model random slopes for a categorical variable with more than two levels (in my real data, I have individuals nested within cities, and I want to include a random slope for each city).

Reprex with the wage data is below.

The error appears to emerge because e$data, produced by wb_prepare_data(), contains dummy variable columns named for each factor level (in the below example, south_namessouth, a 0/1 col) while the random intercept term in e$fin_formula is still (south_names|id). The error arises when the prepped data is passed to prepare_lme4_formula(), specifically when prepare_lme4_formula() calls lme4::mkReTrms)_(). My best guess is that the name discrepancy created by breaking out the factor variable into dummy variable columns is preventing lme4::mkReTrms)_() from finding the relevant random slopes columns.

library(panelr)
library(dplyr)

data("WageData")

WageData <- WageData %>% 
  mutate(south_names = factor(ifelse(south == 0, "north", "south")))
wages <- panel_data(WageData, id = id, wave = t)

# works when you use a binary variable
model <- wbm(lwage ~ wks + union + ms + occ | south + blk + fem | (south|id), 
             data = wages)
# but not when it's a factor instead--a problem if this had more than two levels
model <- wbm(lwage ~ wks + union + ms + occ | south_names + blk + fem | (south_names|id), 
             data = wages)
# Error in eval(predvars, data, env) : object 'south_names' not found
jacob-long commented 1 year ago

Thanks for the report Aidan and I'm almost certain you have diagnosed the root cause properly — I've sprouted quite a few gray hairs trying to properly support factors! I will work on this as soon as I am able, it should be fixable as I already replace factors with their dummies in the other parts of the formula.