sfcheung / semptools

Helper functions for customizing plots by semPlot::semPaths
https://sfcheung.github.io/semptools
GNU General Public License v3.0
6 stars 2 forks source link

Significance asterisks not working for latent variables' residual variances and for the covariances between the latent variables (in categorical data) #162

Closed jorgesinval closed 3 weeks ago

jorgesinval commented 3 weeks ago

The significance asterisks are not completely working when the model uses categorical data. In this example, the significance asterisks are not appearing for the latent variables' residual variances and for the covariances between the latent variables.

library(lavaan)
library(semPlot)
library(semptools)

HS.model <- ' visual  =~ x1 + x2 + x3
               textual =~ x4 + x5 + x6
               speed   =~ x7 + x8 + x9 '

fit <- cfa(HS.model, data = HolzingerSwineford1939, ord=T)

summary(fit, standardized=TRUE)

plot_model <- semPlotModel(fit)

plot_sem <- semPlot::semPaths(object = plot_model,
                              thresholds = F,
                              whatLabels = "stand",
                              intercepts = F,
                              style = "ram",
                              DoNotPlot = T)

p_p <- mark_sig(semPaths_plot = plot_sem,
                  object = fit)
plot(p_p)

image

If you add "std.all" it gets messy (with items' residuals getting asterisks, when they should not have for this particular model):

p_p <- mark_sig(semPaths_plot = plot_sem,
                  object = fit, std_type = "std.all")
plot(p_p)

image

sfcheung commented 3 weeks ago

Thanks a lot for reporting this issue. It has been fixed in the developmental version, which can be installed from GitHub:

remotes::install_github("sfcheung/semptools")

Attached below Is a reprex (a long one), adapted from your example.

Regarding the issue about the asterisks for the residuals, I would like to explain a little more. I agree that for this model, the (unstandardized) residuals are not supposed to have p-values. This is the case in the output of summary() and lavaan::parameterEstimates(). However, for the standardized solution, lavaan reports p-values for these residuals (after standardization) in lavaan::standardizedSolution() (based on delta-method standard errors, as far as I understand). We assume users would like to have asterisks consistent with the lavaan output and so will use the p-values from standardizedSolution().

The residuals do not have p-values in the original (unstandardized) solution. Therefore, if you prefer to use the p-values for the original solution but plot the standardized parameter estimates, then your first example can achieve this (just set object to the lavaan output and do not use std_type, even though the standardized estimates are plotted).

If you would like to use the standardized solution p-values, but would not like to have asterisks for the residuals, then you need to manually remove the p-values of the residuals from the output of the standardizedSolution(), and pass the table to ests, as shown in the last plot

I hope this version can solve your problem.

Thanks again.

# https://github.com/sfcheung/semptools/issues/162
library(lavaan)
#> This is lavaan 0.6-19
#> lavaan is FREE software! Please report any bugs.
library(semPlot)
library(semptools)
library(semhelpinghands)
#> 
#> Attaching package: 'semhelpinghands'
#> The following object is masked from 'package:base':
#> 
#>     sort_by

HS.model <- ' visual  =~ x1 + x2 + x3
              textual =~ x4 + x5 + x6
              speed   =~ x7 + x8 + x9 '

# Use a small sample to have variation in p-values
dat <- HolzingerSwineford1939[1:160, paste0("x", 1:9)]
dat <- data.frame(scale(dat))
dat <- lapply(dat,
              function(x) {
                  as.numeric(cut(x,
                                 breaks = c(-Inf, -1.5, 1.5, Inf)))
                })
dat <- data.frame(dat)
head(dat)
#>   x1 x2 x3 x4 x5 x6 x7 x8 x9
#> 1  2  2  1  2  2  2  2  2  2
#> 2  2  2  2  2  2  2  2  2  3
#> 3  2  2  2  1  1  1  2  1  2
#> 4  2  2  2  2  2  2  2  2  2
#> 5  2  2  2  2  2  2  2  2  2
#> 6  2  2  2  1  2  2  2  2  3

fit <- cfa(HS.model, data = dat, ord=T)

plot_model <- semPlotModel(fit)

# Standardized Solution: Use the p-value in the unstandardized solution

plot_sem <- semPlot::semPaths(object = plot_model,
                              thresholds = F,
                              whatLabels = "stand",
                              intercepts = F,
                              style = "ram",
                              mar = c(3, 3, 10, 3),
                              edge.label.cex = .8,
                              DoNotPlot = T)

p_p <- mark_sig(semPaths_plot = plot_sem,
                object = fit)
plot(p_p)
title("Use p-values from the (unstandardized) solution")


# Check the results using semhelpinghands::add_sig()
parameterEstimates(fit, standardized = TRUE, ci = FALSE) |>
  add_sig() |>
  filter_by(op = c("=~", "~~"))
#>        lhs op     rhs   est sig    se     z pvalue std.lv std.all
#> 1   visual =~      x1 1.000     0.000    NA     NA  0.986   0.986
#> 2   visual =~      x2 0.388 *   0.174 2.227  0.026  0.382   0.382
#> 3   visual =~      x3 0.583 **  0.195 2.982  0.003  0.575   0.575
#> 4  textual =~      x4 1.000     0.000    NA     NA  0.888   0.888
#> 5  textual =~      x5 0.891 *** 0.159 5.598  0.000  0.791   0.791
#> 6  textual =~      x6 0.944 *** 0.185 5.098  0.000  0.838   0.838
#> 7    speed =~      x7 1.000     0.000    NA     NA  0.432   0.432
#> 8    speed =~      x8 1.907 *   0.791 2.411  0.016  0.823   0.823
#> 9    speed =~      x9 1.274 *   0.532 2.394  0.017  0.550   0.550
#> 28      x1 ~~      x1 0.029     0.000    NA     NA  0.029   0.029
#> 29      x2 ~~      x2 0.854     0.000    NA     NA  0.854   0.854
#> 30      x3 ~~      x3 0.670     0.000    NA     NA  0.670   0.670
#> 31      x4 ~~      x4 0.211     0.000    NA     NA  0.211   0.211
#> 32      x5 ~~      x5 0.374     0.000    NA     NA  0.374   0.374
#> 33      x6 ~~      x6 0.297     0.000    NA     NA  0.297   0.297
#> 34      x7 ~~      x7 0.814     0.000    NA     NA  0.814   0.814
#> 35      x8 ~~      x8 0.322     0.000    NA     NA  0.322   0.322
#> 36      x9 ~~      x9 0.698     0.000    NA     NA  0.698   0.698
#> 37  visual ~~  visual 0.971 **  0.317 3.061  0.002  1.000   1.000
#> 38 textual ~~ textual 0.789 *** 0.169 4.669  0.000  1.000   1.000
#> 39   speed ~~   speed 0.186     0.128 1.455  0.146  1.000   1.000
#> 40  visual ~~ textual 0.389 *** 0.088 4.433  0.000  0.444   0.444
#> 41  visual ~~   speed 0.185 *   0.083 2.230  0.026  0.434   0.434
#> 42 textual ~~   speed 0.006     0.004 1.580  0.114  0.015   0.015

# Standardized Solution: Use the p-value in the standardized solution

p_p2 <- mark_sig(semPaths_plot = plot_sem,
                 object = fit,
                 std_type = "std.all")
plot(p_p2)
title("Use p-values from the standardized solution")


# Check the results
standardizedSolution(fit, ci = FALSE) |>
  add_sig() |>
  filter_by(op = c("=~", "~~"))
#>        lhs op     rhs est.std sig    se     z pvalue
#> 1   visual =~      x1   0.986 *** 0.161 6.122  0.000
#> 2   visual =~      x2   0.382 **  0.140 2.731  0.006
#> 3   visual =~      x3   0.575 *** 0.140 4.118  0.000
#> 4  textual =~      x4   0.888 *** 0.095 9.337  0.000
#> 5  textual =~      x5   0.791 *** 0.084 9.395  0.000
#> 6  textual =~      x6   0.838 *** 0.119 7.045  0.000
#> 7    speed =~      x7   0.432 **  0.148 2.910  0.004
#> 8    speed =~      x8   0.823 *** 0.174 4.726  0.000
#> 9    speed =~      x9   0.550 *** 0.162 3.387  0.001
#> 28      x1 ~~      x1   0.029     0.317 0.090  0.928
#> 29      x2 ~~      x2   0.854 *** 0.107 7.979  0.000
#> 30      x3 ~~      x3   0.670 *** 0.160 4.178  0.000
#> 31      x4 ~~      x4   0.211     0.169 1.249  0.212
#> 32      x5 ~~      x5   0.374 **  0.133 2.806  0.005
#> 33      x6 ~~      x6   0.297     0.200 1.488  0.137
#> 34      x7 ~~      x7   0.814 *** 0.128 6.354  0.000
#> 35      x8 ~~      x8   0.322     0.287 1.124  0.261
#> 36      x9 ~~      x9   0.698 *** 0.179 3.907  0.000
#> 37  visual ~~  visual   1.000     0.000    NA     NA
#> 38 textual ~~ textual   1.000     0.000    NA     NA
#> 39   speed ~~   speed   1.000     0.000    NA     NA
#> 40  visual ~~ textual   0.444 *** 0.110 4.035  0.000
#> 41  visual ~~   speed   0.434 **  0.143 3.041  0.002
#> 42 textual ~~   speed   0.015     0.011 1.358  0.175

# No standardization

plot_sem_nostd <- semPlot::semPaths(object = plot_model,
                              thresholds = F,
                              whatLabels = "est",
                              intercepts = F,
                              style = "ram",
                              mar = c(3, 3, 10, 3),
                              edge.label.cex = .8,
                              DoNotPlot = T)

p_p_nostd <- mark_sig(semPaths_plot = plot_sem_nostd,
                object = fit)
plot(p_p_nostd)
title("(Unstandardized) Solution")


# Check the results
parameterEstimates(fit, ci = FALSE) |>
  add_sig() |>
  filter_by(op = c("=~", "~~"))
#>        lhs op     rhs   est sig    se     z pvalue
#> 1   visual =~      x1 1.000     0.000    NA     NA
#> 2   visual =~      x2 0.388 *   0.174 2.227  0.026
#> 3   visual =~      x3 0.583 **  0.195 2.982  0.003
#> 4  textual =~      x4 1.000     0.000    NA     NA
#> 5  textual =~      x5 0.891 *** 0.159 5.598  0.000
#> 6  textual =~      x6 0.944 *** 0.185 5.098  0.000
#> 7    speed =~      x7 1.000     0.000    NA     NA
#> 8    speed =~      x8 1.907 *   0.791 2.411  0.016
#> 9    speed =~      x9 1.274 *   0.532 2.394  0.017
#> 28      x1 ~~      x1 0.029     0.000    NA     NA
#> 29      x2 ~~      x2 0.854     0.000    NA     NA
#> 30      x3 ~~      x3 0.670     0.000    NA     NA
#> 31      x4 ~~      x4 0.211     0.000    NA     NA
#> 32      x5 ~~      x5 0.374     0.000    NA     NA
#> 33      x6 ~~      x6 0.297     0.000    NA     NA
#> 34      x7 ~~      x7 0.814     0.000    NA     NA
#> 35      x8 ~~      x8 0.322     0.000    NA     NA
#> 36      x9 ~~      x9 0.698     0.000    NA     NA
#> 37  visual ~~  visual 0.971 **  0.317 3.061  0.002
#> 38 textual ~~ textual 0.789 *** 0.169 4.669  0.000
#> 39   speed ~~   speed 0.186     0.128 1.455  0.146
#> 40  visual ~~ textual 0.389 *** 0.088 4.433  0.000
#> 41  visual ~~   speed 0.185 *   0.083 2.230  0.026
#> 42 textual ~~   speed 0.006     0.004 1.580  0.114

# Plot standardized solution
# Use p-values after standardization
# Remove residual p-values

std <- standardizedSolution(fit) |>
         filter_by(op = c("=~", "~~"))
std[std$lhs %in% lavNames(fit, "ov") &
    std$rhs %in% lavNames(fit, "ov"), "pvalue"] <- NA
std
#>        lhs op     rhs est.std    se     z pvalue ci.lower ci.upper
#> 1   visual =~      x1   0.986 0.161 6.122  0.000    0.670    1.301
#> 2   visual =~      x2   0.382 0.140 2.731  0.006    0.108    0.657
#> 3   visual =~      x3   0.575 0.140 4.118  0.000    0.301    0.848
#> 4  textual =~      x4   0.888 0.095 9.337  0.000    0.702    1.075
#> 5  textual =~      x5   0.791 0.084 9.395  0.000    0.626    0.956
#> 6  textual =~      x6   0.838 0.119 7.045  0.000    0.605    1.072
#> 7    speed =~      x7   0.432 0.148 2.910  0.004    0.141    0.722
#> 8    speed =~      x8   0.823 0.174 4.726  0.000    0.482    1.165
#> 9    speed =~      x9   0.550 0.162 3.387  0.001    0.232    0.868
#> 28      x1 ~~      x1   0.029 0.317 0.090     NA   -0.593    0.651
#> 29      x2 ~~      x2   0.854 0.107 7.979     NA    0.644    1.064
#> 30      x3 ~~      x3   0.670 0.160 4.178     NA    0.356    0.984
#> 31      x4 ~~      x4   0.211 0.169 1.249     NA   -0.120    0.542
#> 32      x5 ~~      x5   0.374 0.133 2.806     NA    0.113    0.635
#> 33      x6 ~~      x6   0.297 0.200 1.488     NA   -0.094    0.688
#> 34      x7 ~~      x7   0.814 0.128 6.354     NA    0.563    1.065
#> 35      x8 ~~      x8   0.322 0.287 1.124     NA   -0.240    0.884
#> 36      x9 ~~      x9   0.698 0.179 3.907     NA    0.348    1.048
#> 37  visual ~~  visual   1.000 0.000    NA     NA    1.000    1.000
#> 38 textual ~~ textual   1.000 0.000    NA     NA    1.000    1.000
#> 39   speed ~~   speed   1.000 0.000    NA     NA    1.000    1.000
#> 40  visual ~~ textual   0.444 0.110 4.035  0.000    0.228    0.660
#> 41  visual ~~   speed   0.434 0.143 3.041  0.002    0.154    0.714
#> 42 textual ~~   speed   0.015 0.011 1.358  0.175   -0.007    0.038

p_p_std_p <- mark_sig(semPaths_plot = plot_sem,
                ests = std)
plot(p_p_std_p)
title("Standardized solution p-values\nP-values for residuals removed")

Created on 2024-11-06 with reprex v2.1.1

jorgesinval commented 3 weeks ago

Yes, lavaan's default parameterization ('delta') does not allow us, for example, to constrain the residual variances of categorical indicators. For that, we need the 'theta' parameterization. Thank you, now the default behavior is the one I would expect from the function.