leeper / margins

An R Port of Stata's 'margins' Command
https://cloud.r-project.org/package=margins
Other
260 stars 39 forks source link

R and Stata covariance matrices inconsistent? #163

Open jackobailey opened 3 years ago

jackobailey commented 3 years ago

Hello,

Writing to report a possible bug. See below for a reproducible R/Stata toy example.

First, generate a toy data set in R:

library(tidyverse)
library(margins)
library(haven)

set.seed(666)

n <- 1000
x <- rnorm(n, 0, 1)
r <- sample(0:1, n, replace = T)
y <- 0 + 0.25*x + 0.5*r + 0.75*x*r + rnorm(n, 0, 1)

dta <- data.frame(y, x, r)

write_dta(dta, "file/path/dummy.dta")

Next, fit a simple linear model and get margins using margins.

m1 <- lm(y ~ x*r, dta)
summary(m1)
mg1 <- 
  margins(
    m1,
    variables = "x",
    at = list(r = 0:1),
    vce = "delta"
  )

If we run vcov(mg1) we get the covariance matrix for the margins object and it looks something like this:

              dydx_x.1      dydx_x.2
dydx_x.1  2.036632e-03 -2.337855e-11
dydx_x.2 -2.337855e-11  2.002420e-03

Next, run the same command in Stata:

use "file/path/dummy.dta"
regress y c.x##i.r
matrix list  e(V)
margins, dydx(x) at(r=(0 1)) post
matrix list  e(V)

This gives the following covariance matrix instead:

                 x:         x:
                 1.         2.
               _at        _at
x:1._at  .00203663
x:2._at  3.036e-18  .00200242

My question: is this a bug? Or am I doing something stupid?

jackobailey commented 3 years ago

Here's the R and Stata regression output.

R:

Call:
lm(formula = y ~ x * r, data = dta)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.86028 -0.65072 -0.00758  0.69345  2.74165 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.07921    0.04432   1.787   0.0742 .  
x            0.24933    0.04513   5.525 4.21e-08 ***
r            0.46209    0.06253   7.390 3.11e-13 ***
x:r          0.70239    0.06355  11.052  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.9879 on 996 degrees of freedom
Multiple R-squared:  0.3453,    Adjusted R-squared:  0.3433 
F-statistic: 175.1 on 3 and 996 DF,  p-value: < 2.2e-16

Stata:

. regress y c.x##i.r

      Source |       SS           df       MS      Number of obs   =     1,000
-------------+----------------------------------   F(3, 996)       =    175.09
       Model |   512.58423         3   170.86141   Prob > F        =    0.0000
    Residual |  971.973352       996  .975876859   R-squared       =    0.3453
-------------+----------------------------------   Adj R-squared   =    0.3433
       Total |  1484.55758       999  1.48604363   Root MSE        =    .98786

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |   .2493274   .0451291     5.52   0.000     .1607685    .3378864
         1.r |   .4620928   .0625298     7.39   0.000     .3393875    .5847981
             |
       r#c.x |
          1  |   .7023906   .0635535    11.05   0.000     .5776765    .8271048
             |
       _cons |   .0792121   .0443173     1.79   0.074    -.0077538    .1661781
------------------------------------------------------------------------------