rvlenth / emmeans

Estimated marginal means
https://rvlenth.github.io/emmeans/
364 stars 32 forks source link

lavaan probit model: calculation of simple slopes and std. error #517

Open albertostefanelli opened 2 days ago

albertostefanelli commented 2 days ago

I am trying to implement the calculation for simple slopes estimation for probit models in lavaan as it is currently not support in semTools (I will cross-post).

The idea is to be able to plot the slope of a regression coefficient and the corresponding CI. So far, we can achieve this in lavaan + emmeans using a linear probability model.


library(semTools)
library(lavaan)
library(emmeans)
library(ggplot2)
# Load the PoliticalDemocracy dataset
data("PoliticalDemocracy")

# Create a binary indicator for dem60 (e.g., using median as a threshold)
PoliticalDemocracy$dem60_bin <- ifelse(PoliticalDemocracy$y1 >= mean(PoliticalDemocracy$y1), 1, 0)

### LPM

model <- '
  # Latent variable definition
  ind60 =~ y1 + y2 + y3 + y4
  # Probit regression with ind60 predicting dem60_bin
  dem60_bin ~ b*ind60

'

# Fit the model using WLSMV estimator for binary outcomes
fit <- sem(model, data = PoliticalDemocracy, meanstructure=TRUE)

summary(fit)

slope <- emmeans(fit, "ind60", 
                 lavaan.DV = "dem60_bin", 
                 at = list(ind60 = seq(-2, 2, 0.01)),
                 rg.limit = 60000)|> data.frame()

# Plot the marginal effect of the latent variable ind60 with standard errors
ggplot(slope, aes(x = ind60, y = emmean)) +
  geom_line(color = "blue") +
  geom_ribbon(aes(ymin = asymp.LCL, ymax = asymp.UCL), 
              alpha = 0.2, fill = "lightblue") +
  labs(
    title = "Marginal Effect of ind60 (Latent Variable) on the Predicted Probability of dem60_bin",
    x = "ind60 (Latent Variable)",
    y = "Marginal Effect of ind60"
  ) +
  theme_minimal()

However, semTools does not support any link function at this point so I have to relay on manual calculations to obtain the predicted probabilities. So far, I am able to estimate the change in probability for the slope and the marginal probabilities. However, I am pretty sure that the way I am calculating the SE is wrong as they too small compared to the lpm model. any advice on this is highly appreciated.


### PROBIT LINK 
# Define the probit model with ind60 as a latent variable
model <- '
  # Latent variable definition
  ind60 =~ y1 + y2 + y3 + y4
  # intercept/threeshold
  dem60_bin|"t1"*t1

  # Probit regression with ind60 predicting dem60_bin
  dem60_bin ~ b*ind60
  # the slope exprssed in probabilities
  slope :=  pnorm(-t1)*b

'

# Fit the model using WLSMV estimator for binary outcomes
fit <- sem(model, data = PoliticalDemocracy, ordered = "dem60_bin", estimator = "WLSMV")
summary(fit)
# Extract model coefficients
coef_fit <- coef(fit)
intercept <- coef_fit["t1"]
beta_ind60 <- coef_fit["b"]
params <- parameterEstimates(fit)
se_beta_ind60 <- params[params$label == "b", "se"]

# Define a range of ind60 values for the marginal effect calculation
# Here, we will use the predicted values from the latent variable
ind60_seq <- seq(-2, 2, length.out = 100)  # Assuming a standard range for latent variable

# Calculate marginal effects for each value of ind60 in ind60_seq
marginal_effects_ind60 <- sapply(ind60_seq, function(ind60_value) {
  # Linear predictor for given ind60_value
  linear_predictor <- -intercept + (beta_ind60 * ind60_value)
  pdf_value <- pnorm(linear_predictor)
  #marginal_effect <- pdf_value * beta_ind60
  return(pdf_value)
})

# Standard errors for marginal effects using the Delta Method
se_marginal_effects <- sapply(ind60_seq, function(ind60_value) {
  # Linear predictor for given ind60_value
  linear_predictor <- -intercept + beta_ind60 * ind60_value
  pdf_value <- dnorm(linear_predictor)

  # Delta Method: SE = |f'(x)| * SE(beta)
  marginal_effect_se <- abs(pdf_value) * se_beta_ind60
  return(marginal_effect_se)
})

plot_data <- data.frame(ind60 = ind60_seq, marginal_effect = marginal_effects_ind60, se_marginal_effect = se_marginal_effects)

# Plot the marginal effect of the latent variable ind60 with standard errors
ggplot(plot_data, aes(x = ind60, y = marginal_effect)) +
  geom_line(color = "blue") +
  geom_ribbon(aes(ymin = marginal_effect - 1.96 * se_marginal_effect, ymax = marginal_effect + 1.96 * se_marginal_effect), 
              alpha = 0.2, fill = "lightblue") +
  labs(
    title = "Marginal Effect of ind60 (Latent Variable) on the Predicted Probability of dem60_bin",
    x = "ind60 (Latent Variable)",
    y = "Marginal Effect of ind60"
  ) +
  theme_minimal()
rvlenth commented 2 days ago

However, semTools does not support any link function at this point ...

Is that true? I doubt it! But even if so, it seems to be a simple matter, as we can just manually specify it:

emmeans(..., tran = "probit")

However, if you're trying to estimate slopes, note you can't back-transform slopes when there is a transformation or link present. That said, the emtrends function is pretty flexible in terms of computing difference quotients relative to other scales; see its documentation.

albertostefanelli commented 2 days ago

thanks for the answer. I confirm that that both the emtrends and emmeans functions provided by the semTools package do not support binomial dependent variables. Running any of these functions results in the following error

Error : {dem60_bin} is an ordered variable! Currently only continuous DVs are supported.
See `?lavaan2emmeans` for more info.
Error in (function (object, at, cov.reduce = mean, cov.keep = get_emm_option("cov.keep"),  : 
  Perhaps a 'data' or 'params' argument is needed

I am trying to reproduce the slope estimation form a LPM model (first model in my previous post ) but using a probit model (second model in my previous post), all in lavaan. Any suggestion on the code contained in my previous post is appreciated. if I manage to figure out how to calculate the std. errors manually I can do a pull request in the semTool package to implement the support for the probit link.

rvlenth commented 2 days ago

OK, well that clarifies things. It may be possible to get qdrg() to work with those models. But it seems to me you'll get more help from the semTools developers than from me.

rvlenth commented 7 hours ago

My additional comment is that there must be a reason that ordered dependent variables are not supported by semTools::lavaan2emmeans. So I suspect that it is not easy to do what you are trying to do.

On the other hand, it looks like you just have a binary response, but for some reason the DV is ordered. Maybe if it is just an ordinary probit model (unordered factor), it will be supported. But I don't understand how it came to be treated as an ordered factor.

albertostefanelli commented 6 hours ago

I have posted on the semTools github but I am afraid I will get no response for the devs any time soon. However, just for reference. to specify a probit model with an unordered factor lavaan uses the ordered = argument. See this discussion from the dev of lavaan