yrosseel / lavaan

an R package for structural equation modeling and more
http://lavaan.org
412 stars 99 forks source link

Pairwise likelihood estimation with sampling weights #308

Closed haziqj closed 7 months ago

haziqj commented 8 months ago

These changes allow for weights to be used for pairwise likelihood estimation. Main changes are

  1. Turn off "force MLR" when !is.null(sampling.weights) and estimator = "PML".
  2. Build weighted pairwise frequency counts and store in lavcache.
  3. Ensure weights enter the PL objective function to be minimised.
  4. Ensure the Hessian employs the weights correctly.
  5. Modify the estimate of the J variability matrix (i.e. the crossproduct) with weights.

(4 & 5 builds the correct sandwich variance estimator)

Initially I contemplated on modifying the frequency counts in lavTables() to show weighted frequency counts and proportions, but I don't see this behaviour for e.g. using estimator = "ML" so opted against doing this.

Just a note to say that I am taking a lot from Myrsini's prior work so her style of coding will be apparent in the commits.

haziqj commented 8 months ago

A reprex as proof of concept:

library(lavaan)
#> This is lavaan 0.6-17.9001
#> lavaan is FREE software! Please report any bugs.
library(tidyverse)
set.seed(123)

# Generate superpopulation of 5 binary items with prob. of selection inversely
# proportional to eta1 using underlying variable method
N <- 10000
pop.model <- '
eta1 =~ 0.8 * y1 + 0.7 * y2 + 0.47 * y3 + 0.38 * y4 + 0.34 * y5
'
tau <- c(1.43, 0.55, 0.13, 0.72, 1.13)
ystar <- simulateData(pop.model, sample.nobs = N, standardized = TRUE)
Pop <- 
  {1 * (ystar > matrix(tau, nrow = N, ncol = 5, byrow = TRUE))} |>
  as_tibble() |>
  mutate(across(everything(), \(x) ordered(x, levels = c(0, 1))),
         prob = 1 / (1 + exp(ystar[, 1])))

# Generate an informative sample from the population
Data <- 
  slice_sample(Pop, n = 2000, weight_by = prob, replace = FALSE) |>
  mutate(wt = 1 / prob)

# Fit models
mod <- '
eta1 =~ NA * y1 + y2 + y3 + y4 + y5
'
fit1 <- sem(model = mod, data = Data, estimator = "PML", std.lv = TRUE)
fit2 <- sem(model = mod, data = Data, estimator = "PML", std.lv = TRUE,
            sampling.weights = "wt")

# Check bias
truth <- c(0.8, 0.7, 0.47, 0.38, 0.34, tau)
coef(fit1) - truth  # Ignoring weights
#> eta1=~y1 eta1=~y2 eta1=~y3 eta1=~y4 eta1=~y5    y1|t1    y2|t1    y3|t1 
#>    0.017   -0.023   -0.119   -0.031   -0.113    0.513    0.234    0.138 
#>    y4|t1    y5|t1 
#>    0.073    0.118
coef(fit2) - truth  # Weighted
#> eta1=~y1 eta1=~y2 eta1=~y3 eta1=~y4 eta1=~y5    y1|t1    y2|t1    y3|t1 
#>    0.012    0.028   -0.087    0.009   -0.103   -0.088   -0.016   -0.004 
#>    y4|t1    y5|t1 
#>   -0.069    0.037

Created on 2023-11-08 with reprex v2.0.2

yrosseel commented 7 months ago

Thanks a lot. This works really well. Merged now.