sfcheung / semfindr

A find(e)r of influential cases and outliers in SEM
https://sfcheung.github.io/semfindr/
GNU General Public License v3.0
1 stars 0 forks source link

`lavaan_rerun()` failed when using listwise deletion #110

Closed marklhc closed 1 month ago

marklhc commented 3 months ago

I was using semfindr on a real data set, and ran into an unexpected error when using listwise deletion. Here's an example:

library(lavaan)
#> This is lavaan 0.6-17
#> lavaan is FREE software! Please report any bugs.
library(semfindr)

# From test-mahalanobis_single_missing.R

mod <-
  '
iv1 ~~ iv2
m1 ~ iv1 + iv2
dv ~ m1
'

dat <- pa_dat

dat0 <- dat[1:50, ]
dat0[1, 2] <- dat0[2, 3] <- dat0[3, 4] <- dat0[4, ] <- NA

# Fit with FIML
fit0 <- lavaan::sem(mod, data = dat0, missing = "fiml.x")
#> Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING: some cases are empty and will be ignored:
#>   4
lavaan_rerun(fit0)  # no problem
#> The expected CPU time is 8.6 second(s).
#> Could be faster if run in parallel.
#> === lavaan_rerun Output ===
#> Call:
#> lavaan_rerun(fit = fit0)
#> Number of reruns: 50
#> Number of reruns that converged (solution found): 50
#> Number of reruns that failed to converge (solution not found): 0
#> Number of reruns that passed post.check of lavaan: 50
#> Number of reruns that failed post.check of lavaan: 0
#> Number of reruns that both converged and passed post.check: 50
#> Number of reruns that either did not converge or failed post.check: 0

# Fit with listwise deletion
fit0_lw <- lavaan::sem(mod, data = dat0)
lavaan_rerun(fit0_lw)  # error
#> The expected CPU time is 3.77 second(s).
#> Could be faster if run in parallel.
#> Error in lavData(data = data, group = group, cluster = cluster, ov.names = OV.NAMES, : lavaan ERROR: data= argument is not a data.fame, but of class 'numeric'
#> Timing stopped at: 6.347 4.795 1.405

Created on 2024-03-14 with reprex v2.1.0

It happens because lavaan_rerun() uses the case_id to select rows to rerun, but when there's missing data it leads to selecting rows beyond the number of cases. I made a quick fix and will submit a pull request and add a test.

sfcheung commented 3 months ago

There are few more issues that need to be fixed. Working on them in https://github.com/sfcheung/semfindr/tree/more_on_iss110

sfcheung commented 3 months ago

Added tests for listwise deletion for multiple-group model and selection by case_ids. Will add more tests this weekend.

sfcheung commented 3 months ago

(I think I accidentally closeed it. So I reopened it. I should wait for a while before closing it, just in case.)

sfcheung commented 1 month ago

Fixed here: https://github.com/sfcheung/semfindr/pull/113