leeper / margins

An R Port of Stata's 'margins' Command
https://cloud.r-project.org/package=margins
Other
263 stars 40 forks source link

Using margins for poststratification adjusted average treatment effects? #153

Open dfrankow opened 4 years ago

dfrankow commented 4 years ago

Thanks for your work on the margins package.

We would like to estimate average treatment effects on a different population than our survey population. One technique to do this is poststratification (see here). The basic idea as I understand it is to make a model with the data we have, stratifying by variables we believe to be explanatory (e.g., age, sex, etc. for surveys), then use that model to predict on the population we are interested in, and make prediction averages weighted by the target population sizes for those strata (see here).

Can we pass a model and a new target population into margins to do that? When I tried supplying a different population as data, the predictions seemed the same.

For example (from here):

# From https://www.r-bloggers.com/gelmans-mrp-in-r-what-is-this-all-about/

set.seed(123)
library(TeachingSampling)
library(dplyr)
library(lme4)
library(margins)

data("Lucy")

SLucy <- sample_n(Lucy, size = 1000)

# Multilevel regression on the sample
M1 <- lmer(Income ~ Level + (1 | Zone), data = SLucy)

fixef(M1)
summary(margins(M1))
summary(margins(M1, data=Lucy))
summary(margins(M1, data=SLucy))

The output:

> fixef(M1)
(Intercept) LevelMedium  LevelSmall 
  1185.4560   -529.7583   -867.2332 
> summary(margins(M1))
      factor       AME      SE        z      p     lower     upper
 LevelMedium -529.7583 23.3926 -22.6464 0.0000 -575.6069 -483.9096
  LevelSmall -867.2332 24.3405 -35.6293 0.0000 -914.9397 -819.5268
> summary(margins(M1, data=Lucy))
      factor       AME      SE        z      p     lower     upper
 LevelMedium -529.7583 23.3926 -22.6464 0.0000 -575.6069 -483.9096
  LevelSmall -867.2332 24.3405 -35.6293 0.0000 -914.9397 -819.5268
> summary(margins(M1, data=SLucy))
      factor       AME      SE        z      p     lower     upper
 LevelMedium -529.7583 23.3926 -22.6464 0.0000 -575.6069 -483.9096
  LevelSmall -867.2332 24.3405 -35.6293 0.0000 -914.9397 -819.5268

Margins agrees with the fixed effects from the model.

I'd expect at least the error to be smaller if we had more data (e.g., Lucy instead of the smaller SLucy). However, all terms including error are exactly the same.

Perhaps I just misunderstand the nature of margins on fixed effect models?

By the way, the blog post does poststratification by doing the reweighting with some matrix math:

# Mean predictions for the sample
grouped <- group_by(SLucy, Zone, Level)
sum <- summarise(grouped, mean2 = mean(Pred))
(Mupred <- matrix(sum$mean2, ncol = 5, nrow = 3))

# Size of the real post-strata
(Np <- table(Lucy$Level, Lucy$Zone))

# Mean income estimation per zone using poststratification
colSums(Np * Mupred) / table(Lucy$Zone)

However, that won't give me error terms like margins would.

dfrankow commented 4 years ago

I thought I'd see if other ways to compute (other vce-s) would produce different results for "data".

However, other vce values didn't work for me with or without the "data" argument:

> summary(margins(M1, vce="bootstrap"))
Error in `*tmp*`[["call"]] : this S4 class is not subsettable
> summary(margins(M1, vce="simulation"))
Error in `[[<-`(`*tmp*`, "model", value = NULL) : 
  [[<- defined for objects of type "S4" only for subclasses of environment

> summary(margins(M1, data=Lucy, vce="bootstrap"))
Error in `*tmp*`[["call"]] : this S4 class is not subsettable
> summary(margins(M1, data=Lucy, vce="simulation"))
Error in `[[<-`(`*tmp*`, "model", value = NULL) : 
  [[<- defined for objects of type "S4" only for subclasses of environment

Not sure why yet. I might file this as a separate issue around lme4.

dfrankow commented 4 years ago

Yet some of the average marginal effects change with a different dataset.

> mtcars$make <- stringr::str_extract(string = row.names(mtcars), pattern = "\\w+\\b")
> M1 <- lmer(mpg ~ cyl * hp + wt + (1|make), data = mtcars)
boundary (singular) fit: see ?isSingular
> summary(M1)
Linear mixed model fit by REML ['lmerMod']
Formula: mpg ~ cyl * hp + wt + (1 | make)
   Data: mtcars

REML criterion at convergence: 153.1

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.4915 -0.6308 -0.2750  0.5424  1.9096 

Random effects:
 Groups   Name        Variance Std.Dev.
 make     (Intercept) 0.000    0.000   
 Residual             5.027    2.242   
Number of obs: 32, groups:  make, 22

Fixed effects:
             Estimate Std. Error t value
(Intercept) 52.017520   4.916935  10.579
cyl         -2.742125   0.800228  -3.427
hp          -0.163594   0.052122  -3.139
wt          -3.119815   0.661322  -4.718
cyl:hp       0.018954   0.006645   2.852

Correlation of Fixed Effects:
       (Intr) cyl    hp     wt    
cyl    -0.846                     
hp     -0.903  0.688              
wt     -0.055 -0.365 -0.029       
cyl:hp  0.946 -0.789 -0.979  0.025
convergence code: 0
boundary (singular) fit: see ?isSingular

> fixef(M1)
(Intercept)         cyl          hp          wt      cyl:hp 
52.01752019 -2.74212515 -0.16359434 -3.11981472  0.01895364 
> margins(M1, data = mtcars[mtcars$am == 0, ])
Average marginal effects
    cyl       hp    wt
 0.2954 -0.03192 -3.12
> margins(M1, data = mtcars)
Average marginal effects
     cyl       hp    wt
 0.03814 -0.04632 -3.12
> margins(M1, data = sample_n(mtcars, 20))
Average marginal effects
      cyl       hp    wt
 -0.02133 -0.05177 -3.12