leeper / margins

An R Port of Stata's 'margins' Command
https://cloud.r-project.org/package=margins
Other
260 stars 39 forks source link

margins() fails when one or more coefficients are not estimated #82

Closed leeper closed 6 years ago

leeper commented 6 years ago

margins() chokes on interactions between factors where some levels of the interaction are missing and therefore not estimated:

mtcars$cyl <- factor(mtcars$cyl)
mtcars$gear <- factor(mtcars$gear)
table(mtcars$cyl, mtcars$gear)
##    
##      3  4  5
##   4  1  8  2
##   6  2  4  1
##   8 12  0  2
margins::margins(lm(mpg ~ cyl * gear, data = mtcars))
## Error in jacobian %*% vcov : non-conformable arguments
## In addition: There were 50 or more warnings (use warnings() to see the first 50)
traceback()
## 6: diag(jacobian %*% vcov %*% t(jacobian))
## 5: delta_once(data = data, model = model, variables = variables, 
##        type = type, vcov = vcov, weights = weights, eps = eps, varslist = varslist, 
##        ...)
## 4: get_effect_variances(data = data, model = model, variables = names(mes), 
##        type = type, vcov = vcov, vce = vce, iterations = iterations, 
##        weights = weights, eps = eps, varslist = varslist, ...)
## 3: build_margins(model = model, data = data_list[[i]], variables = variables, 
##        type = type, vcov = vcov, vce = vce, iterations = iterations, 
##        unit_ses = unit_ses, eps = eps, varslist = varslist, ...)
## 2: margins.lm(lm(mpg ~ cyl * gear, data = mtcars))
## 1: margins::margins(lm(mpg ~ cyl * gear, data = mtcars))

The same issue arises in the face of perfect multicollinearity:

mtcars$wt2 <- 2 * mtcars$wt
margins::margins(lm(mpg ~ wt + wt2, data = mtcars))
## Error in jacobian %*% vcov : non-conformable arguments

The issue is in variance estimation. marginal_effects() works fine:

str(margins::marginal_effects(lm(mpg ~ cyl * gear, data = mtcars)))
'data.frame':   32 obs. of  4 variables:
 $ dydx_cyl6 :Classes 'marginaleffect', 'numeric'  num [1:32] -7.18 -7.18 -7.18 -1.75 -1.75 ...
 $ dydx_cyl8 :Classes 'marginaleffect', 'numeric'  num [1:32] -6.45 -6.45 -6.45 -6.45 -6.45 ...
 $ dydx_gear4:Classes 'marginaleffect', 'numeric'  num [1:32] -7.11e-15 -7.11e-15 5.43 -7.11e-15 5.43 ...
 $ dydx_gear5:Classes 'marginaleffect', 'numeric'  num [1:32] -0.05 -0.05 6.7 -0.05 0.35 ...

Clearly the issue is therefore in jacobian() as it attempts to take a partial derivative with respect to a coefficient that is NA. The solution will be to remove coefficients that are missing:

coef(lm(mpg ~ wt + wt2, data = mtcars))
## (Intercept)          wt         wt2 
##   37.285126   -5.344472          NA

before doing anything. That can happen in find_terms_in_model() but unfortunately terms() is pre-estimation not post, so it doesn't tell us what to drop:

attr(terms(lm(mpg ~ wt + wt2, data = mtcars)), "dataClasses")
##       mpg        wt       wt2 
## "numeric" "numeric" "numeric"
leeper commented 6 years ago

Stata considers the AMEs for the first kind of model (where some interaction levels are missing from the data) to be not estimable:

. sysuse auto, clear
(1978 Automobile Data)

. tab rep78 foreign

    Repair |
    Record |       Car type
      1978 |  Domestic    Foreign |     Total
-----------+----------------------+----------
         1 |         2          0 |         2 
         2 |         8          0 |         8 
         3 |        27          3 |        30 
         4 |         9          9 |        18 
         5 |         2          9 |        11 
-----------+----------------------+----------
     Total |        48         21 |        69 

. quietly regress price i.rep78##i.foreign

. margins, dydx(*)

Average marginal effects                        Number of obs     =         69
Model VCE    : OLS

Expression   : Linear prediction, predict()
dy/dx w.r.t. : 2.rep78 3.rep78 4.rep78 5.rep78 1.foreign

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       rep78 |
          2  |          .  (not estimable)
          3  |          .  (not estimable)
          4  |          .  (not estimable)
          5  |          .  (not estimable)
             |
     foreign |
    Foreign  |          .  (not estimable)
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.

That seems wrong to me but resolving the disparity will either require conforming to Stata or going down a different path.

leeper commented 6 years ago

Oops, this is a duplicate of #71.