yrosseel / lavaan

an R package for structural equation modeling and more
http://lavaan.org
429 stars 98 forks source link

Estimating regression weights from the covariance matrix fails #90

Closed stefvanbuuren closed 6 years ago

stefvanbuuren commented 6 years ago

I am trying out an old-fashioned way to treat missing data by means of the pairwise method.

    data <- airquality[, c("Ozone", "Solar.R", "Wind")]
    mu <- colMeans(data, na.rm = TRUE)
    cv <- cov(data, use = "pairwise")
    library(lavaan)
    fit <- lavaan("Ozone ~ Wind + Solar.R", 
                  sample.mean = mu, sample.cov = cv, 
                  sample.nobs = sum(complete.cases(data)))

I am getting

Error in lav_model_estimate(lavmodel = lavmodel, lavsamplestats = lavsamplestats,  : 
  lavaan ERROR: initial model-implied matrix (Sigma) is not positive definite;
  check your model and/or starting parameters.

which may be the result of the pairwise. However I get the same error after

    cv <- cov(data, use = "complete")

How can I use lavaan to estimate regression weights from this covariance matrix?

TDJorgensen commented 6 years ago

Your R syntax looks correct, but your lavaan model syntax does not specify the full model, only 2 regression paths. If you don't want to specify the residual variance or intercept, you can use the shortcut arguments int.ov.free = TRUE and auto.var = TRUE.

fit <- sem("Ozone ~ Wind + Solar.R", 
           sample.mean = mu, sample.cov = cv, 
           sample.nobs = sum(complete.cases(data)))

Or you could use the sem() function, which is a wrapper around lavaan() that turns those shortcuts on by default.

fit <- lavaan("Ozone ~ Wind + Solar.R", 
              sample.mean = mu, sample.cov = cv, 
              sample.nobs = sum(complete.cases(data)), 
              int.ov.free = TRUE, auto.var = TRUE)

Once those model parameters are specified, lavaan's default starting values should yield a positive definite initial matrix.

stefvanbuuren commented 6 years ago

Thanks for your kind response. I wanted a regression with intercept and residual variance, and had falsely assumed that formula for regression works the same way as in base R.

The following code is probably what I should have specified:

fit <- lavaan("Ozone ~ 1 + Wind + Solar.R
               Ozone ~~ Ozone",
              sample.mean = mu, sample.cov = cv, 
              sample.nobs = sum(complete.cases(data)))

These estimates are as expected.