sfcheung / semfindr

A find(e)r of influential cases and outliers in SEM
https://sfcheung.github.io/semfindr/
GNU General Public License v3.0
1 stars 0 forks source link

Add a suggestion to the error message in `est_change()` (or `pars_id()`) when users specify `parameters` in the form of `.g2`, etc #87

Closed marklhc closed 12 months ago

marklhc commented 1 year ago

I got caught by this so I think it'd be helpful to tell users to use the right labels so that they don't need to go through the documentation of pars_id() and est_change().

sfcheung commented 1 year ago

Thanks! Will work on it soon. :)

marklhc commented 1 year ago

Thanks @sfcheung !

sfcheung commented 12 months ago

Hi, @marklhc , I found that I interpreted incorrectly the error you encountered:

library(semfindr)
data(cfa_dat_mg)
head(cfa_dat_mg)
#>           x1          x2         x3         x4          x5         x6     gp
#> 1 -1.2690938 -1.32550591 -0.6285011 -0.7215738 -2.81187575 -0.8436144 GroupA
#> 2 -0.0304962  0.05275281  1.1867437 -0.5193213 -1.92131327 -0.1802093 GroupA
#> 3 -1.0134542 -0.05071220 -0.7482927 -1.0483134  0.98383479 -0.1672748 GroupA
#> 4  0.1647769 -1.75609084 -1.6568902 -0.8036030 -2.14339421 -0.5066695 GroupA
#> 5  1.9583190  2.25384865  0.4972562  1.3103522  0.36280304  0.9186370 GroupA
#> 6 -1.2487913 -1.48482257 -0.7739649  0.8942307 -0.03870106  0.0403398 GroupA

mod <-
"
f1 =~ x1 + x2 + x3
f2 =~ x4 + x5 + x6
"

library(lavaan)
#> This is lavaan 0.6-15
#> lavaan is FREE software! Please report any bugs.

fit_config <- cfa(mod, cfa_dat_mg,
                  group = "gp")
lavInspect(fit_config, "group.label")
#> [1] "GroupA" "GroupB"
rerun_config <- lavaan_rerun(fit_config, to_rerun = c(1:2, 99:100))
#> The expected CPU time is 0.56 second(s).
#> Could be faster if run in parallel.

est_change(rerun_config,
           parameters = "f1=~x2.g2")
#> 
#> -- Standardized Case Influence on Parameter Estimates --
#> 
#>     f1=~x2.g2   gcd
#> 100    -0.742 0.550
#> 99      0.153 0.023
#> 1       0.000 0.000
#> 2       0.000 0.000
#> 
#> Note:
#> - Changes are standardized raw changes if a case is included.
#> - All stored cases are displayed.
#> - Cases sorted by generalized Cook's distance.
est_change(rerun_config,
           parameters = "f1=~x2.GroupB")
#> 
#> -- Standardized Case Influence on Parameter Estimates --
#> 
#>     f1=~x2.g2   gcd
#> 100    -0.742 0.550
#> 99      0.153 0.023
#> 1       0.000 0.000
#> 2       0.000 0.000
#> 
#> Note:
#> - Changes are standardized raw changes if a case is included.
#> - All stored cases are displayed.
#> - Cases sorted by generalized Cook's distance.

est_change(rerun_config,
           parameters = "f1=~x2.g1")
#> 
#> -- Standardized Case Influence on Parameter Estimates --
#> 
#>     f1=~x2   gcd
#> 1   -0.066 0.004
#> 2    0.006 0.000
#> 100  0.000 0.000
#> 99   0.000 0.000
#> 
#> Note:
#> - Changes are standardized raw changes if a case is included.
#> - All stored cases are displayed.
#> - Cases sorted by generalized Cook's distance.
est_change(rerun_config,
           parameters = "f1=~x2.GroupA")
#> 
#> -- Standardized Case Influence on Parameter Estimates --
#> 
#>     f1=~x2   gcd
#> 1   -0.066 0.004
#> 2    0.006 0.000
#> 100  0.000 0.000
#> 99   0.000 0.000
#> 
#> Note:
#> - Changes are standardized raw changes if a case is included.
#> - All stored cases are displayed.
#> - Cases sorted by generalized Cook's distance.

est_change(rerun_config,
           parameters = "f1=~x2")
#> 
#> -- Standardized Case Influence on Parameter Estimates --
#> 
#>     f1=~x2 f1=~x2.g2   gcd
#> 100  0.000    -0.742 0.550
#> 99   0.000     0.153 0.023
#> 1   -0.066     0.000 0.004
#> 2    0.006     0.000 0.000
#> 
#> Note:
#> - Changes are standardized raw changes if a case is included.
#> - All stored cases are displayed.
#> - Cases sorted by generalized Cook's distance.

# Error expected
est_change(rerun_config,
           parameters = "f1=~x2.g3")
#> Error in pars_id(parameters, fit = fit0, where = "coef"): No parameters selected. Please check the parameter argument.
est_change(rerun_config,
           parameters = "f1=~x2.GroupC")
#> Error in pars_id(parameters, fit = fit0, where = "coef"): No parameters selected. Please check the parameter argument.

Created on 2023-06-27 with reprex v2.0.2

Using suffixes like .g1 and .g2 worked. May you post an example of the error you encountered so I can follow-up? Thanks.

marklhc commented 12 months ago

Hi @sfcheung, I should have clarified. Here's an example:

library(semfindr)
library(lavaan)
#> This is lavaan 0.6-15
#> lavaan is FREE software! Please report any bugs.
## The famous Holzinger and Swineford (1939) example
## First loadings
HS.model <- ' visual  =~ x1 + x2 + x3
              textual =~ x4 + x5 + x6
              speed   =~ x7 + x8 + x9 '
fit_config <- cfa(HS.model, data = HolzingerSwineford1939,
                  group = "school")
rerun_config <- lavaan_rerun(fit_config)
#> The expected CPU time is 30.1 second(s).
#> Could be faster if run in parallel.
est_change(rerun_config, parameters = c("=~.g2"))
#> Error in pars_id(parameters, fit = fit0, where = "coef"): No parameters selected. Please check the parameter argument.
est_change(rerun_config, parameters = c("=~.Grant-White"))
#> 
#> -- Standardized Case Influence on Parameter Estimates --
#> 
#>     visual=~x2.g2 visual=~x3.g2 textual=~x5.g2 textual=~x6.g2 speed=~x8.g2
#> 163         0.359         0.427         -0.235          0.191       -0.235
#> 268         0.565         0.688         -0.045         -0.016       -0.036
#> 252        -0.591        -0.286         -0.248          0.071        0.048
#> 168         0.158         0.182         -0.001         -0.478       -0.042
#> 177         0.026        -0.160         -0.113         -0.046        0.286
#> 180        -0.128        -0.100          0.001          0.039        0.416
#> 196         0.215         0.312         -0.280         -0.041        0.038
#> 194        -0.169        -0.126          0.259         -0.123       -0.186
#> 243        -0.053        -0.141          0.018         -0.024       -0.172
#> 262        -0.012        -0.032         -0.234          0.015        0.178
#>     speed=~x9.g2   gcd
#> 163        0.513 1.007
#> 268       -0.077 0.583
#> 252       -0.002 0.460
#> 168       -0.124 0.351
#> 177        0.471 0.274
#> 180        0.045 0.238
#> 196       -0.133 0.229
#> 194       -0.011 0.220
#> 243        0.250 0.211
#> 262        0.346 0.196
#> 
#> Note:
#> - Changes are standardized raw changes if a case is included.
#> - Only the first 10 case(s) is/are displayed. Set 'first' to NULL to display all cases.
#> - Cases sorted by generalized Cook's distance.
est_change(rerun_config, parameters = c("=~.Pasteur"))
#> 
#> -- Standardized Case Influence on Parameter Estimates --
#> 
#>     visual=~x2 visual=~x3 textual=~x5 textual=~x6 speed=~x8 speed=~x9   gcd
#> 144     -0.108     -0.073      -0.460       0.310    -0.143     0.041 0.699
#> 131      0.638      0.651       0.077       0.021     0.220     0.296 0.648
#> 105      0.446      0.285       0.446       0.562     0.197     0.236 0.620
#> 98      -0.125     -0.374      -0.057      -0.366    -0.148     0.072 0.353
#> 78      -0.261     -0.206      -0.109      -0.407     0.173     0.180 0.300
#> 109     -0.282     -0.327      -0.322      -0.049    -0.127     0.063 0.289
#> 107     -0.265     -0.474      -0.150      -0.155    -0.117    -0.129 0.279
#> 143      0.362      0.132       0.029       0.271     0.175     0.203 0.273
#> 123      0.055      0.090       0.178       0.160     0.341     0.450 0.269
#> 115     -0.235     -0.248      -0.030      -0.094    -0.028    -0.321 0.218
#> 
#> Note:
#> - Changes are standardized raw changes if a case is included.
#> - Only the first 10 case(s) is/are displayed. Set 'first' to NULL to display all cases.
#> - Cases sorted by generalized Cook's distance.

Created on 2023-06-27 with reprex v2.0.2

As you can see, using .g2 does not work in this case, but using the group name works. The syntax is based on the one in the vignette. Thanks.

sfcheung commented 12 months ago

@marklhc , thanks for the example. I fixed this issue in the branch fix_group_labels (https://github.com/sfcheung/semfindr/commit/30317eac7144f28253ce7acd2499f60fadb6d798). It was my oversight. I actually added support for .gX but only did it only to one internal function. It should now supports both .grouplabel and .gX. I also added a few tests to the test files.

# Check out the branch `fix_group_labels` locally before using load_all()
devtools::load_all("J:/GitHub/semfindr")
#> ℹ Loading semfindr
library(lavaan)
#> This is lavaan 0.6-15
#> lavaan is FREE software! Please report any bugs.
HS.model <- ' visual  =~ x1 + x2 + x3
              textual =~ x4 + x5 + x6
              speed   =~ x7 + x8 + x9 '
fit_config <- cfa(HS.model, data = HolzingerSwineford1939,
                  group = "school")
rerun_config <- lavaan_rerun(fit_config)
#> The expected CPU time is 57.19 second(s).
#> Could be faster if run in parallel.
est_change(rerun_config, parameters = c("=~.g1"))
#> 
#> -- Standardized Case Influence on Parameter Estimates --
#> 
#>     visual=~x2 visual=~x3 textual=~x5 textual=~x6 speed=~x8 speed=~x9   gcd
#> 144     -0.108     -0.073      -0.460       0.310    -0.143     0.041 0.699
#> 131      0.638      0.651       0.077       0.021     0.220     0.296 0.648
#> 105      0.446      0.285       0.446       0.562     0.197     0.236 0.620
#> 98      -0.125     -0.374      -0.057      -0.366    -0.148     0.072 0.353
#> 78      -0.261     -0.206      -0.109      -0.407     0.173     0.180 0.300
#> 109     -0.282     -0.327      -0.322      -0.049    -0.127     0.063 0.289
#> 107     -0.265     -0.474      -0.150      -0.155    -0.117    -0.129 0.279
#> 143      0.362      0.132       0.029       0.271     0.175     0.203 0.273
#> 123      0.055      0.090       0.178       0.160     0.341     0.450 0.269
#> 115     -0.235     -0.248      -0.030      -0.094    -0.028    -0.321 0.218
#> 
#> Note:
#> - Changes are standardized raw changes if a case is included.
#> - Only the first 10 case(s) is/are displayed. Set 'first' to NULL to display all cases.
#> - Cases sorted by generalized Cook's distance.
est_change(rerun_config, parameters = c("=~.Pasteur"))
#> 
#> -- Standardized Case Influence on Parameter Estimates --
#> 
#>     visual=~x2 visual=~x3 textual=~x5 textual=~x6 speed=~x8 speed=~x9   gcd
#> 144     -0.108     -0.073      -0.460       0.310    -0.143     0.041 0.699
#> 131      0.638      0.651       0.077       0.021     0.220     0.296 0.648
#> 105      0.446      0.285       0.446       0.562     0.197     0.236 0.620
#> 98      -0.125     -0.374      -0.057      -0.366    -0.148     0.072 0.353
#> 78      -0.261     -0.206      -0.109      -0.407     0.173     0.180 0.300
#> 109     -0.282     -0.327      -0.322      -0.049    -0.127     0.063 0.289
#> 107     -0.265     -0.474      -0.150      -0.155    -0.117    -0.129 0.279
#> 143      0.362      0.132       0.029       0.271     0.175     0.203 0.273
#> 123      0.055      0.090       0.178       0.160     0.341     0.450 0.269
#> 115     -0.235     -0.248      -0.030      -0.094    -0.028    -0.321 0.218
#> 
#> Note:
#> - Changes are standardized raw changes if a case is included.
#> - Only the first 10 case(s) is/are displayed. Set 'first' to NULL to display all cases.
#> - Cases sorted by generalized Cook's distance.
est_change(rerun_config, parameters = c("=~.g2"))
#> 
#> -- Standardized Case Influence on Parameter Estimates --
#> 
#>     visual=~x2.g2 visual=~x3.g2 textual=~x5.g2 textual=~x6.g2 speed=~x8.g2
#> 163         0.359         0.427         -0.235          0.191       -0.235
#> 268         0.565         0.688         -0.045         -0.016       -0.036
#> 252        -0.591        -0.286         -0.248          0.071        0.048
#> 168         0.158         0.182         -0.001         -0.478       -0.042
#> 177         0.026        -0.160         -0.113         -0.046        0.286
#> 180        -0.128        -0.100          0.001          0.039        0.416
#> 196         0.215         0.312         -0.280         -0.041        0.038
#> 194        -0.169        -0.126          0.259         -0.123       -0.186
#> 243        -0.053        -0.141          0.018         -0.024       -0.172
#> 262        -0.012        -0.032         -0.234          0.015        0.178
#>     speed=~x9.g2   gcd
#> 163        0.513 1.007
#> 268       -0.077 0.583
#> 252       -0.002 0.460
#> 168       -0.124 0.351
#> 177        0.471 0.274
#> 180        0.045 0.238
#> 196       -0.133 0.229
#> 194       -0.011 0.220
#> 243        0.250 0.211
#> 262        0.346 0.196
#> 
#> Note:
#> - Changes are standardized raw changes if a case is included.
#> - Only the first 10 case(s) is/are displayed. Set 'first' to NULL to display all cases.
#> - Cases sorted by generalized Cook's distance.
est_change(rerun_config, parameters = c("=~.Grant-White"))
#> 
#> -- Standardized Case Influence on Parameter Estimates --
#> 
#>     visual=~x2.g2 visual=~x3.g2 textual=~x5.g2 textual=~x6.g2 speed=~x8.g2
#> 163         0.359         0.427         -0.235          0.191       -0.235
#> 268         0.565         0.688         -0.045         -0.016       -0.036
#> 252        -0.591        -0.286         -0.248          0.071        0.048
#> 168         0.158         0.182         -0.001         -0.478       -0.042
#> 177         0.026        -0.160         -0.113         -0.046        0.286
#> 180        -0.128        -0.100          0.001          0.039        0.416
#> 196         0.215         0.312         -0.280         -0.041        0.038
#> 194        -0.169        -0.126          0.259         -0.123       -0.186
#> 243        -0.053        -0.141          0.018         -0.024       -0.172
#> 262        -0.012        -0.032         -0.234          0.015        0.178
#>     speed=~x9.g2   gcd
#> 163        0.513 1.007
#> 268       -0.077 0.583
#> 252       -0.002 0.460
#> 168       -0.124 0.351
#> 177        0.471 0.274
#> 180        0.045 0.238
#> 196       -0.133 0.229
#> 194       -0.011 0.220
#> 243        0.250 0.211
#> 262        0.346 0.196
#> 
#> Note:
#> - Changes are standardized raw changes if a case is included.
#> - Only the first 10 case(s) is/are displayed. Set 'first' to NULL to display all cases.
#> - Cases sorted by generalized Cook's distance.

Created on 2023-06-28 with reprex v2.0.2

Despite what I wrote in the help page, pars_id() actually supports using .g2 and even .g1, though lavaan does not add .g1 to parameters in the first group. I personally do not recommend using .gX because which group is the first group, which group is the second group, etc., depend on the order the labels appear in the data. The numbering can change across analyses.

That said, .g2, .g3, etc. are what users see in some outputs of lavaan. Therefore, I will amend the help page to remark that .gX can be used, though not recommended.

Please let me know whether this fixed the bug you found. I yes, I will push it to devel and than to main later.

Thanks a lot. :)

marklhc commented 12 months ago

Thank you @sfcheung. It works now.

sfcheung commented 12 months ago

Fixed in #95 (version 0.1.5.2)

(Note: R CMD checks failiing, but due to the GitHub actions, not due to the package. Similar problems occurred in other packages at the time of writing.)