sfcheung / semfindr

A find(e)r of influential cases and outliers in SEM
https://sfcheung.github.io/semfindr/
GNU General Public License v3.0
1 stars 0 forks source link

Error in `test-lavaan_rerun_warnings.R` #47

Closed marklhc closed 2 years ago

marklhc commented 2 years ago

Hi @sfcheung, when I run tests for the ld branch, I run into an error in this file, where it resulted in, on line 24,

Error in lavaan_rerun(fit) : 
  The solution is not admissible by lavaan post.check. Check the SEM results first.

Not sure whether it's only on my computer or you're already aware of it, but bringing it up in case.

Mark

sfcheung commented 2 years ago

Interesting. I did not have this error message. Before doing reruns, lavaan_rerun() checks the fit by lavaan_rerun_check(). I believe this is the source of the package.

May you try this on your computer? Right after the following:

fit <- lavaan::cfa(mod, dat)

Run this:

lavInspect(fit, "post.check")

If this is FALSE, it is due to the generated data and lavaan. lavaan_rerun() just reports this. Somehow the data and/or the lavaan results in your computer are different from those in mine.

If this is TRUE, then something's wrong with the check. I will try to figure out where the problem is.

marklhc commented 2 years ago

It returns FALSE, as shown below. I think it's not the first time I see the same seed generating different data on different computers, especially when it's on different platforms. Maybe you can save your data directly?

library(testthat)
library(lavaan)
#> This is lavaan 0.6-12
#> lavaan is FREE software! Please report any bugs.
library(semfindr)

#context("Test handling of reruns with warnings")

mod_p <- 
'
f1 =~  .7*x1 + .6*x2 + .8*x3 + .3*x5
f2 =~  .2*x1 + .6*x4 + .8*x5 + .7*x6
f1 ~~ .2*f2
'
mod <- 
'
f1 =~ x1 + x2 + x3
f2 =~ x4 + x5 + x6
'

# generate data
set.seed(48157438)
dat <- lavaan::simulateData(mod_p, sample.nobs =  45L)
fit <- lavaan::cfa(mod, dat)
#> Warning in lav_object_post_check(object): lavaan WARNING: some estimated ov
#> variances are negative
lavInspect(fit, "post.check")
#> Warning in lav_object_post_check(object): lavaan WARNING: some estimated ov
#> variances are negative
#> [1] FALSE

Created on 2022-07-28 by the reprex package (v2.0.1)

sfcheung commented 2 years ago

Interesting. Attached is the data generated from the code, up to dat .... and I got no warning from lavaan().

test_sf.zip

Can you test on your computer the following code, to see if you get the same random numbers?

set.seed(98753)
rnorm(10)
#>  [1]  0.7846224 -0.6324866  0.3927858 -1.4378575  0.5113181  0.6154332
#>  [7] -0.4046077  1.4039016 -0.4613423 -0.5565554

I tested this in R 3.5.2 and a developmental version of R (Jul-22, which is for testing packages and so I customized nothing) and got the same results in both cases. If the seed cannot ensure reproducibility with default settings, this is interesting and important.

It is also possible the issue occurred in calling cfa(). But let's rule out the possibility of having different datasets first.

marklhc commented 2 years ago

I think the random number generation gives the same numbers:

set.seed(98753)
rnorm(10)
#>  [1]  0.7846224 -0.6324866  0.3927858 -1.4378575  0.5113181  0.6154332
#>  [7] -0.4046077  1.4039016 -0.4613423 -0.5565554

Created on 2022-07-29 by the reprex package (v2.0.1)

But the data I generated using lavaan::simulateData() with the same seed are different from yours

mod_p <- 
'
f1 =~  .7*x1 + .6*x2 + .8*x3 + .3*x5
f2 =~  .2*x1 + .6*x4 + .8*x5 + .7*x6
f1 ~~ .2*f2
'

# generate data
set.seed(48157438)
dat <- lavaan::simulateData(mod_p, sample.nobs =  45L)
head(dat)
#>           x1         x2         x3         x5         x4         x6
#> 1 -0.6239875  1.2016118 -0.6306324  1.6920402  0.1324854  0.8078510
#> 2  1.7986030 -0.3134413 -1.3996096  0.6500886 -0.4508315  0.7495345
#> 3  0.5828103  1.6414731  0.7941053 -0.4218119 -0.2108365  0.7482752
#> 4 -3.1224430  0.2907512 -1.1214294 -2.3263483 -0.1152664 -1.8820520
#> 5  0.4170535  0.1078207 -0.4241836  1.5641871  0.4881869  0.8373156
#> 6  1.7454050  0.4031509  0.7839998  1.5042781  1.6428226  0.5088278

Created on 2022-07-29 by the reprex package (v2.0.1)

My computer has version 0.6-12 of lavaan. I then tried using my office computer, which had version 0.6-11, and it generated the same data as yours. Upgrading the package to 0.6-12 on my office computer still gave the same data as yours. So it's probably not because of the package version. Perhaps something different on my own laptop.

But then I tried using another computer on campus, and the weird thing is that it generated a third different data set, which I didn't expect. It also has lavaan version 0.6-12. All three computers are on Ubuntu Linux system. So an initial guess is that the function in lavaan may depend on some other packages that can give different results. An option is that we try to simulate data on our own and see whether that is more replicable.

I also tested MASS::mvrnorm(), which lavaan::simulateData() called, and it gives the same result on all three computers. So I wasn't quite sure where things went different on these computers.

sfcheung commented 2 years ago

Maybe it is about lavaan. May you try one more test on your computer, this time using the seed argument in lavaan::simulateData()? (I ran the following on a fresh session of R with only reprex loaded before running the code.)

mod_p <-
'
f1 =~  .7*x1 + .6*x2 + .8*x3 + .3*x5
f2 =~  .2*x1 + .6*x4 + .8*x5 + .7*x6
f1 ~~ .2*f2
'
dat <- lavaan::simulateData(mod_p, sample.nobs =  45L, seed = 48157438)
head(dat)
#>           x1         x2         x3         x5         x4          x6
#> 1 -1.2366642  0.7259477  0.5491608  0.9268323  0.5527255  1.49721893
#> 2  0.3268405  1.4969577 -1.6606505  1.1595120  0.1605944  0.06376055
#> 3  0.7278165  0.7564645  1.0609042  0.2983937 -1.2416582  0.76731424
#> 4 -2.1528254 -1.6096284 -0.1406489 -3.4034447 -1.1064092 -0.07856727
#> 5 -0.1264448  0.5489839 -0.1651137  1.2591702  1.0145481  0.89002483
#> 6  1.8225447  0.8487155  0.5422317  1.0778713  1.5752090  1.03206444
sessionInfo()
#> R version 4.2.1 (2022-06-23 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19044)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=Chinese (Traditional)_Hong Kong SAR.utf8 
#> [2] LC_CTYPE=Chinese (Traditional)_Hong Kong SAR.utf8   
#> [3] LC_MONETARY=Chinese (Traditional)_Hong Kong SAR.utf8
#> [4] LC_NUMERIC=C                                        
#> [5] LC_TIME=Chinese (Traditional)_Hong Kong SAR.utf8    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] knitr_1.39        magrittr_2.0.3    MASS_7.3-57       mnormt_2.1.0     
#>  [5] pbivnorm_0.6.0    R.cache_0.15.0    rlang_1.0.3       fastmap_1.1.0    
#>  [9] fansi_1.0.3       stringr_1.4.0     styler_1.7.0      highr_0.9        
#> [13] tools_4.2.1       parallel_4.2.1    xfun_0.31         R.oo_1.25.0      
#> [17] utf8_1.2.2        cli_3.3.0         withr_2.5.0       htmltools_0.5.2  
#> [21] ellipsis_0.3.2    yaml_2.3.5        digest_0.6.29     tibble_3.1.7     
#> [25] lifecycle_1.0.1   lavaan_0.6-12     purrr_0.3.4       R.utils_2.12.0   
#> [29] vctrs_0.4.1       fs_1.5.2          glue_1.6.2        evaluate_0.15    
#> [33] rmarkdown_2.14    reprex_2.0.1      stringi_1.7.6     compiler_4.2.1   
#> [37] pillar_1.8.0      R.methodsS3_1.8.2 stats4_4.2.1      pkgconfig_2.0.3

(Created on 2022-07-30 by the reprex package (v2.0.1))

On my computer, setting the seed argument or using set.seed() yields the same results:

mod_p <-
'
f1 =~  .7*x1 + .6*x2 + .8*x3 + .3*x5
f2 =~  .2*x1 + .6*x4 + .8*x5 + .7*x6
f1 ~~ .2*f2
'
set.seed(48157438)
dat <- lavaan::simulateData(mod_p, sample.nobs =  45L)
head(dat)
#>           x1         x2         x3         x5         x4          x6
#> 1 -1.2366642  0.7259477  0.5491608  0.9268323  0.5527255  1.49721893
#> 2  0.3268405  1.4969577 -1.6606505  1.1595120  0.1605944  0.06376055
#> 3  0.7278165  0.7564645  1.0609042  0.2983937 -1.2416582  0.76731424
#> 4 -2.1528254 -1.6096284 -0.1406489 -3.4034447 -1.1064092 -0.07856727
#> 5 -0.1264448  0.5489839 -0.1651137  1.2591702  1.0145481  0.89002483
#> 6  1.8225447  0.8487155  0.5422317  1.0778713  1.5752090  1.03206444
sessionInfo()
#> R version 4.2.1 (2022-06-23 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19044)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=Chinese (Traditional)_Hong Kong SAR.utf8 
#> [2] LC_CTYPE=Chinese (Traditional)_Hong Kong SAR.utf8   
#> [3] LC_MONETARY=Chinese (Traditional)_Hong Kong SAR.utf8
#> [4] LC_NUMERIC=C                                        
#> [5] LC_TIME=Chinese (Traditional)_Hong Kong SAR.utf8    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] knitr_1.39        magrittr_2.0.3    MASS_7.3-57       mnormt_2.1.0     
#>  [5] pbivnorm_0.6.0    R.cache_0.15.0    rlang_1.0.3       fastmap_1.1.0    
#>  [9] fansi_1.0.3       stringr_1.4.0     styler_1.7.0      highr_0.9        
#> [13] tools_4.2.1       parallel_4.2.1    xfun_0.31         R.oo_1.25.0      
#> [17] utf8_1.2.2        cli_3.3.0         withr_2.5.0       htmltools_0.5.2  
#> [21] ellipsis_0.3.2    yaml_2.3.5        digest_0.6.29     tibble_3.1.7     
#> [25] lifecycle_1.0.1   lavaan_0.6-12     purrr_0.3.4       R.utils_2.12.0   
#> [29] vctrs_0.4.1       fs_1.5.2          glue_1.6.2        evaluate_0.15    
#> [33] rmarkdown_2.14    reprex_2.0.1      stringi_1.7.6     compiler_4.2.1   
#> [37] pillar_1.8.0      R.methodsS3_1.8.2 stats4_4.2.1      pkgconfig_2.0.3

(Created on 2022-07-30 by the reprex package (v2.0.1))

If this is, or may be, a lavaan-issue, we can raise an issue there, with examples showing different results from the same code, preferably using the seed argument in lavaan::simulateData() if we confirm that both set.seed() and the seed argument yield the same results on the same machine.

sfcheung commented 2 years ago

I ran the same code in a session of RStudio Cloud, and got different results:

> mod_p <-
+     '
+ f1 =~  .7*x1 + .6*x2 + .8*x3 + .3*x5
+ f2 =~  .2*x1 + .6*x4 + .8*x5 + .7*x6
+ f1 ~~ .2*f2
+ '
> dat <- lavaan::simulateData(mod_p, sample.nobs =  45L, seed = 48157438)
> head(dat)
          x1         x2         x3         x5         x4         x6
1 -0.7344868 -0.3225644  0.4174221  2.2034124  0.2687257  0.1382194
2  1.6038333 -0.3921951 -1.3723534  1.1893754  0.4021166 -0.5422432
3  0.9565809 -0.4570731  1.7242985  0.4226339 -0.3682637 -0.1886596
4 -3.0045571 -1.1943861 -0.2591753 -1.9304851 -2.3711637 -0.4345226
5  0.2512135  0.1162136 -0.3093157  1.6280467  0.7881543  0.5546011
6  1.5799831  1.1996108  0.3659563  1.4121253  0.4794301  1.6586023
> sessionInfo()
R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3
LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] MASS_7.3-57    compiler_4.2.1 parallel_4.2.1 tools_4.2.1    lavaan_0.6-12 
[6] mnormt_2.1.0   pbivnorm_0.6.0 stats4_4.2.1

It is not about MASS::mvnorm(). These are the results from my Windows computer (a fresh session with no reprex to avoid loading anything else)

> set.seed(1234); MASS::mvrnorm(10, c(0, 0), diag(2)); sessionInfo()
             [,1]       [,2]
 [1,]  0.47719270 -1.2070657
 [2,]  0.99838644  0.2774292
 [3,]  0.77625389  1.0844412
 [4,] -0.06445882 -2.3456977
 [5,] -0.95949406  0.4291247
 [6,]  0.11028549  0.5060559
 [7,]  0.51100951 -0.5747400
 [8,]  0.91119542 -0.5466319
 [9,]  0.83717168 -0.5644520
[10,] -2.41583518 -0.8900378
R version 4.2.1 (2022-06-23 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale:
[1] LC_COLLATE=Chinese (Traditional)_Hong Kong SAR.utf8
[2] LC_CTYPE=Chinese (Traditional)_Hong Kong SAR.utf8
[3] LC_MONETARY=Chinese (Traditional)_Hong Kong SAR.utf8
[4] LC_NUMERIC=C
[5] LC_TIME=Chinese (Traditional)_Hong Kong SAR.utf8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] MASS_7.3-57    compiler_4.2.1 cli_3.3.0      jsonlite_1.8.0 rlang_1.0.3

This is from my session at RStudio Cloud

> set.seed(1234); MASS::mvrnorm(10, c(0, 0), diag(2)); sessionInfo()
             [,1]       [,2]
 [1,]  0.47719270 -1.2070657
 [2,]  0.99838644  0.2774292
 [3,]  0.77625389  1.0844412
 [4,] -0.06445882 -2.3456977
 [5,] -0.95949406  0.4291247
 [6,]  0.11028549  0.5060559
 [7,]  0.51100951 -0.5747400
 [8,]  0.91119542 -0.5466319
 [9,]  0.83717168 -0.5644520
[10,] -2.41583518 -0.8900378
R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3
LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] MASS_7.3-57    compiler_4.2.1 tools_4.2.1  
sfcheung commented 2 years ago

Now I think this is an issue of MASS::mvrorm(). May you try running the following on your computers?

set.seed(48157438)
tmp <- MASS::mvrnorm(n = 45, mu = c(0, 0, 0, 0, 0, 0),
                     Sigma = structure(c(1.586, 0.444, 0.592, 0.494, 0.204, 0.238, 0.444,
                                         1.36, 0.48, 0.276, 0.072, 0.084, 0.592, 0.48, 1.64, 0.368, 0.096,
                                         0.112, 0.494, 0.276, 0.368, 1.826, 0.516, 0.602, 0.204, 0.072,
                                         0.096, 0.516, 1.36, 0.42, 0.238, 0.084, 0.112, 0.602, 0.42, 1.49),
                                       dim = c(6L, 6L)),
                     empirical = FALSE)
head(tmp)
sessionInfo()

The values of the arguments are the values right before the following call in lavaan::simulateData() in our examples (https://github.com/sfcheung/semfindr/issues/47#issuecomment-1200058114), generated using dput():

https://github.com/yrosseel/lavaan/blob/70200542d9947c3481743308382053f0e5d84b28/R/lav_simulate_old.R#L244-L247

These are the results on my Windows 10 computer in a fresh session (no reprex to minimize the output of sessionInfo()):

> head(tmp)
           [,1]        [,2]       [,3]       [,4]       [,5]       [,6]
[1,] -0.5666245  1.04113854 -0.6357546  1.8927987  0.0830405  0.6044046
[2,]  1.6784428 -0.96362572 -1.0113183  1.1953877 -0.4934128  0.1971653
[3,]  0.4534500  0.94409544  1.1492976  0.3276048 -1.1669922  0.7483211
[4,] -2.8906343  0.68778314 -1.4566519 -2.7673726 -0.9738294 -0.7083442
[5,]  0.4385888  0.08646819 -0.4283660  1.5677962  0.7115686  0.6355679
[6,]  1.8789804  0.74660406  0.5946723  1.0105691  1.4920030  1.1641762
> sessionInfo()
R version 4.2.1 (2022-06-23 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale:
[1] LC_COLLATE=Chinese (Traditional)_Hong Kong SAR.utf8  LC_CTYPE=Chinese (Traditional)_Hong Kong SAR.utf8   
[3] LC_MONETARY=Chinese (Traditional)_Hong Kong SAR.utf8 LC_NUMERIC=C                                        
[5] LC_TIME=Chinese (Traditional)_Hong Kong SAR.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] MASS_7.3-57    compiler_4.2.1

These are the results on a RStudio Cloud session:

> head(tmp)
           [,1]       [,2]       [,3]       [,4]       [,5]       [,6]
[1,] -0.3409452  0.3445963 -0.2875492  1.8307505 -0.6891646  1.2727270
[2,]  1.1975554 -2.0014002  0.3138339  0.8592679 -0.1506302  0.4248849
[3,]  0.2591959  0.6936230  1.6706614 -0.1084941 -0.6263129  0.8387816
[4,] -2.2470672  0.7486964 -2.1339891 -2.5846232 -1.7363104 -0.4475218
[5,]  0.4534455 -0.2866127 -0.1669889  1.5822867  0.4075718  0.8753313
[6,]  2.0321173  0.6654442  0.4511123  1.1983962  1.2955842  1.0741223
> sessionInfo()
R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3
LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] MASS_7.3-57    compiler_4.2.1 tools_4.2.1

The generated numbers are different.

sfcheung commented 2 years ago

@marklhc , I think I found the source of the differences. MASS::mvrnorm() calls eigen(), and the results of eigen() may depend on the platform in some cases, e.g., when the number of variables is six. These are from my Windows 10 machine:

p <- 6
r <- .1
tmp <- matrix(r, p, p)
diag(tmp) <- 1
etmp <- eigen(tmp)
tmp
#>      [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,]  1.0  0.1  0.1  0.1  0.1  0.1
#> [2,]  0.1  1.0  0.1  0.1  0.1  0.1
#> [3,]  0.1  0.1  1.0  0.1  0.1  0.1
#> [4,]  0.1  0.1  0.1  1.0  0.1  0.1
#> [5,]  0.1  0.1  0.1  0.1  1.0  0.1
#> [6,]  0.1  0.1  0.1  0.1  0.1  1.0
etmp
#> eigen() decomposition
#> $values
#> [1] 1.5 0.9 0.9 0.9 0.9 0.9
#> 
#> $vectors
#>            [,1]       [,2]          [,3]       [,4]          [,5]
#> [1,] -0.4082483 -0.2993072  0.000000e+00  0.0000000  0.000000e+00
#> [2,] -0.4082483  0.9048458  0.000000e+00  0.0000000 -4.016825e-17
#> [3,] -0.4082483 -0.1513846  0.000000e+00  0.0000000  8.660254e-01
#> [4,] -0.4082483 -0.1513846 -8.756053e-17  0.8164966 -2.886751e-01
#> [5,] -0.4082483 -0.1513846 -7.071068e-01 -0.4082483 -2.886751e-01
#> [6,] -0.4082483 -0.1513846  7.071068e-01 -0.4082483 -2.886751e-01
#>            [,6]
#> [1,]  0.8624085
#> [2,]  0.1207783
#> [3,] -0.2457967
#> [4,] -0.2457967
#> [5,] -0.2457967
#> [6,] -0.2457967
sapply(seq_len(p), function(x) (tmp - diag(etmp$values[x], p, p)) %*% etmp$vectors[, x])
#>               [,1]          [,2]          [,3]          [,4]          [,5]
#> [1,] -5.551115e-17 -6.938894e-17  0.000000e+00  0.000000e+00  6.938894e-18
#> [2,]  8.604228e-16 -7.632783e-16  0.000000e+00  0.000000e+00  6.938894e-18
#> [3,]  3.191891e-16 -1.457168e-16  0.000000e+00  0.000000e+00 -1.179612e-16
#> [4,]  3.191891e-16 -1.457168e-16  0.000000e+00 -1.110223e-16  4.163336e-17
#> [5,]  3.191891e-16 -1.474515e-16  9.714451e-17  5.551115e-17  4.510281e-17
#> [6,]  3.053113e-16 -1.474515e-16 -9.714451e-17  5.551115e-17  4.510281e-17
#>              [,6]
#> [1,] 5.551115e-17
#> [2,] 6.938894e-17
#> [3,] 6.938894e-17
#> [4,] 7.632783e-17
#> [5,] 7.632783e-17
#> [6,] 7.632783e-17
sessionInfo()
#> R version 3.5.2 (2018-12-20)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19044)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=Chinese (Traditional)_Hong Kong SAR.950 
#> [2] LC_CTYPE=Chinese (Traditional)_Hong Kong SAR.950   
#> [3] LC_MONETARY=Chinese (Traditional)_Hong Kong SAR.950
#> [4] LC_NUMERIC=C                                       
#> [5] LC_TIME=Chinese (Traditional)_Hong Kong SAR.950    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] compiler_3.5.2  magrittr_1.5    tools_3.5.2     htmltools_0.3.6
#>  [5] yaml_2.2.0      Rcpp_1.0.1      stringi_1.3.1   rmarkdown_1.11 
#>  [9] highr_0.7       knitr_1.21      stringr_1.4.0   xfun_0.5       
#> [13] digest_0.6.18   evaluate_0.13

Created on 2022-07-30 by the reprex package (v0.2.1)

These are from a session on RStudio Cloud:

p <- 6
r <- .1
tmp <- matrix(r, p, p)
diag(tmp) <- 1
etmp <- eigen(tmp)
tmp
#>      [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,]  1.0  0.1  0.1  0.1  0.1  0.1
#> [2,]  0.1  1.0  0.1  0.1  0.1  0.1
#> [3,]  0.1  0.1  1.0  0.1  0.1  0.1
#> [4,]  0.1  0.1  0.1  1.0  0.1  0.1
#> [5,]  0.1  0.1  0.1  0.1  1.0  0.1
#> [6,]  0.1  0.1  0.1  0.1  0.1  1.0
etmp
#> eigen() decomposition
#> $values
#> [1] 1.5 0.9 0.9 0.9 0.9 0.9
#> 
#> $vectors
#>            [,1]        [,2]          [,3]        [,4]        [,5]       [,6]
#> [1,] -0.4082483 -0.28593648  0.000000e+00  0.81098660  0.00000000  0.3063893
#> [2,] -0.4082483  0.85215004  2.008412e-17 -0.02828010  0.00000000  0.3261501
#> [3,] -0.4082483 -0.42513472  1.543843e-01 -0.56353234  0.08776636  0.5508971
#> [4,] -0.4082483 -0.01901824  6.453651e-01 -0.03672663 -0.44649765 -0.4645067
#> [5,] -0.4082483 -0.10304234 -7.461908e-01 -0.14572091 -0.42420409 -0.2544231
#> [6,] -0.4082483 -0.01901824 -5.355861e-02 -0.03672663  0.78293538 -0.4645067
sapply(seq_len(p), function(x) (tmp - diag(etmp$values[x], p, p)) %*% etmp$vectors[, x])
#>               [,1]          [,2]          [,3]          [,4]          [,5]
#> [1,] -4.163336e-17 -2.233456e-17 -6.938894e-18 -3.989864e-17  1.387779e-17
#> [2,]  6.522560e-16 -6.884684e-16 -6.938894e-18  7.112366e-17  1.387779e-17
#> [3,]  1.526557e-16  6.093216e-17 -6.245005e-17  1.543904e-16  2.775558e-17
#> [4,]  1.526557e-16 -1.784597e-16 -2.428613e-16  7.459311e-17 -2.775558e-17
#> [5,]  1.595946e-16 -1.281527e-16  2.706169e-16  9.194034e-17 -2.775558e-17
#> [6,]  1.665335e-16 -1.778092e-16  1.301043e-17  7.632783e-17  8.326673e-17
#>               [,6]
#> [1,] -1.387779e-17
#> [2,] -1.387779e-17
#> [3,] -1.387779e-17
#> [4,] -6.938894e-17
#> [5,] -6.245005e-17
#> [6,] -8.326673e-17
sessionInfo()
#> R version 4.2.1 (2022-06-23)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 20.04.4 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3
#> 
#> locale:
#>  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
#>  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
#>  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
#> [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] digest_0.6.29   withr_2.5.0     magrittr_2.0.3  reprex_2.0.1   
#>  [5] evaluate_0.15   highr_0.9       stringi_1.7.8   rlang_1.0.4    
#>  [9] cli_3.3.0       rstudioapi_0.13 fs_1.5.2        rmarkdown_2.14 
#> [13] tools_4.2.1     stringr_1.4.0   glue_1.6.2      xfun_0.31      
#> [17] yaml_2.3.5      fastmap_1.1.0   compiler_4.2.1  htmltools_0.5.3
#> [21] knitr_1.39

Created on 2022-07-30 by the reprex package (v2.0.1)

The eigenvectors are "correct" enough in both cases. Moreover, all the eigenvalue except for the first one is the same and so the order of the eigenvectors (except for the first one) is arbitrary, even the values in the vector are the same. Nevertheless, these differences will result in different sets of random numbers generated, even with the same RNG state.

Note that the differences are not unique to the special matrix I used above. The same differences can be observed the previous examples

On my Windows 10 computer:

tmp <- structure(c(1.586, 0.444, 0.592, 0.494, 0.204, 0.238, 0.444,
                    1.36, 0.48, 0.276, 0.072, 0.084, 0.592, 0.48, 1.64, 0.368, 0.096,
                    0.112, 0.494, 0.276, 0.368, 1.826, 0.516, 0.602, 0.204, 0.072,
                    0.096, 0.516, 1.36, 0.42, 0.238, 0.084, 0.112, 0.602, 0.42, 1.49),
                  dim = c(6L, 6L))
eigen(tmp)
#> eigen() decomposition
#> $values
#> [1] 3.322434 1.939566 1.000000 1.000000 1.000000 1.000000
#> 
#> $vectors
#>            [,1]       [,2]        [,3]         [,4]       [,5]       [,6]
#> [1,] -0.4610573 -0.3134467  0.80862579  0.000000000  0.1878873  0.0000000
#> [2,] -0.3093321 -0.3829315 -0.46193551 -0.002677222  0.5901650  0.4427084
#> [3,] -0.4124429 -0.5105754 -0.31885036  0.247574235 -0.4916105 -0.4056652
#> [4,] -0.5553453  0.3417610 -0.11265487 -0.654843516 -0.3077723  0.1963570
#> [5,] -0.3005094  0.3999201 -0.13492474  0.041461774  0.5104402 -0.6850438
#> [6,] -0.3505943  0.4665734  0.01336226  0.712853927 -0.1394625  0.3627724

Created on 2022-07-30 by the reprex package (v0.2.1)

On a session of RStudio Cloud:

tmp <- structure(c(1.586, 0.444, 0.592, 0.494, 0.204, 0.238, 0.444,
                    1.36, 0.48, 0.276, 0.072, 0.084, 0.592, 0.48, 1.64, 0.368, 0.096,
                    0.112, 0.494, 0.276, 0.368, 1.826, 0.516, 0.602, 0.204, 0.072,
                    0.096, 0.516, 1.36, 0.42, 0.238, 0.084, 0.112, 0.602, 0.42, 1.49),
                  dim = c(6L, 6L))
eigen(tmp)
#> eigen() decomposition
#> $values
#> [1] 3.322434 1.939566 1.000000 1.000000 1.000000 1.000000
#> 
#> $vectors
#>            [,1]       [,2]        [,3]       [,4]        [,5]        [,6]
#> [1,] -0.4610573 -0.3134467  0.64152547  0.0000000  0.52689883  0.00000000
#> [2,] -0.3093321 -0.3829315 -0.73335152  0.2278065  0.39441196 -0.11142722
#> [3,] -0.4124429 -0.5105754  0.06144783  0.1033734 -0.73945567  0.08914657
#> [4,] -0.5553453  0.3417610 -0.19405059 -0.7312755 -0.04637273 -0.01486973
#> [5,] -0.3005094  0.3999201  0.08841609  0.4151751 -0.13270022 -0.74294072
#> [6,] -0.3505943  0.4665734 -0.03730611  0.4798791  0.01619793  0.65380031

Created on 2022-07-30 by the reprex package (v2.0.1)

Is it a known issue of eigen()?

sfcheung commented 2 years ago

Not yet have time to try but I guess it is also related to the content of the input matrix, not just the size. In the matrix from our lavaan::simulateDat() example, the differences in eigenvectors do not occur to the first two eigenvalues.

By the way, I also found the differences when p <- 4. You can try it.

I understand different methods for eigenvectors can yield different results. However, for the same function, it is reasonable to expect that this function yields platform-independent results, especially when it is used in cases in which we need reproducible results.

marklhc commented 2 years ago

Thanks for the detailed investigation. I also found that the result of eigen() is different across different BLAS routines on the same computer. This is the result when I use openblas:

tmp <- structure(c(1.586, 0.444, 0.592, 0.494, 0.204, 0.238, 0.444,
                   1.36, 0.48, 0.276, 0.072, 0.084, 0.592, 0.48, 1.64, 0.368, 0.096,
                   0.112, 0.494, 0.276, 0.368, 1.826, 0.516, 0.602, 0.204, 0.072,
                   0.096, 0.516, 1.36, 0.42, 0.238, 0.084, 0.112, 0.602, 0.42, 1.49),
                 dim = c(6L, 6L))
eigen(tmp)
#> eigen() decomposition
#> $values
#> [1] 3.322434 1.939566 1.000000 1.000000 1.000000 1.000000
#> 
#> $vectors
#>           [,1]       [,2]        [,3]        [,4]       [,5]        [,6]
#> [1,] 0.4610573 -0.3134467  0.79305850  0.00000000 -0.2454292  0.00000000
#> [2,] 0.3093321 -0.3829315 -0.31531252  0.79634331  0.0512865  0.14649956
#> [3,] 0.4124429 -0.5105754 -0.31414685 -0.53198266  0.4117730 -0.13398835
#> [4,] 0.5553453  0.3417610 -0.38211985 -0.17406620 -0.6279662  0.06430314
#> [5,] 0.3005094  0.3999201  0.07460577  0.22918674  0.2948514 -0.77764266
#> [6,] 0.3505943  0.4665734  0.14617246  0.00248703  0.5350685  0.59306155
sessionInfo()
#> R version 4.2.1 (2022-06-23)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 20.04.4 LTS
#> 
#> Matrix products: default
#> BLAS/LAPACK: /opt/OpenBLAS/lib/libopenblas-r0.3.13.so
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] rstudioapi_0.13   knitr_1.39        magrittr_2.0.3    R.cache_0.16.0   
#>  [5] rlang_1.0.4       fastmap_1.1.0     fansi_1.0.3       stringr_1.4.0    
#>  [9] styler_1.7.0      highr_0.9         tools_4.2.1       xfun_0.31        
#> [13] R.oo_1.25.0       utf8_1.2.2        cli_3.3.0         withr_2.5.0      
#> [17] htmltools_0.5.3   yaml_2.3.5        digest_0.6.29     tibble_3.1.8     
#> [21] lifecycle_1.0.1   purrr_0.3.4       R.utils_2.12.0    vctrs_0.4.1      
#> [25] fs_1.5.2          glue_1.6.2        evaluate_0.15     rmarkdown_2.14   
#> [29] reprex_2.0.1      stringi_1.7.8     compiler_4.2.1    pillar_1.8.0     
#> [33] R.methodsS3_1.8.2 pkgconfig_2.0.3

Created on 2022-07-30 by the reprex package (v2.0.1)

And this is with the regular/default BLAS of Ubuntu

tmp <- structure(c(1.586, 0.444, 0.592, 0.494, 0.204, 0.238, 0.444,
                   1.36, 0.48, 0.276, 0.072, 0.084, 0.592, 0.48, 1.64, 0.368, 0.096,
                   0.112, 0.494, 0.276, 0.368, 1.826, 0.516, 0.602, 0.204, 0.072,
                   0.096, 0.516, 1.36, 0.42, 0.238, 0.084, 0.112, 0.602, 0.42, 1.49),
                 dim = c(6L, 6L))
eigen(tmp)
#> eigen() decomposition
#> $values
#> [1] 3.322434 1.939566 1.000000 1.000000 1.000000 1.000000
#> 
#> $vectors
#>            [,1]       [,2]        [,3]         [,4]       [,5]       [,6]
#> [1,] -0.4610573 -0.3134467  0.80862579  0.000000000  0.1878873  0.0000000
#> [2,] -0.3093321 -0.3829315 -0.46193551 -0.002677222  0.5901650  0.4427084
#> [3,] -0.4124429 -0.5105754 -0.31885036  0.247574235 -0.4916105 -0.4056652
#> [4,] -0.5553453  0.3417610 -0.11265487 -0.654843516 -0.3077723  0.1963570
#> [5,] -0.3005094  0.3999201 -0.13492474  0.041461774  0.5104402 -0.6850438
#> [6,] -0.3505943  0.4665734  0.01336226  0.712853927 -0.1394625  0.3627724
sessionInfo()
#> R version 4.2.1 (2022-06-23)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 20.04.4 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] rstudioapi_0.13   knitr_1.39        magrittr_2.0.3    R.cache_0.16.0   
#>  [5] rlang_1.0.4       fastmap_1.1.0     fansi_1.0.3       stringr_1.4.0    
#>  [9] styler_1.7.0      highr_0.9         tools_4.2.1       xfun_0.31        
#> [13] R.oo_1.25.0       utf8_1.2.2        cli_3.3.0         withr_2.5.0      
#> [17] htmltools_0.5.3   yaml_2.3.5        digest_0.6.29     tibble_3.1.8     
#> [21] lifecycle_1.0.1   purrr_0.3.4       R.utils_2.12.0    vctrs_0.4.1      
#> [25] fs_1.5.2          glue_1.6.2        evaluate_0.15     rmarkdown_2.14   
#> [29] reprex_2.0.1      stringi_1.7.8     compiler_4.2.1    pillar_1.8.0     
#> [33] R.methodsS3_1.8.2 pkgconfig_2.0.3

Created on 2022-07-30 by the reprex package (v2.0.1)

I couldn't find mentions of such an issue. Maybe it's worth posting on stack overflow or some other places?

It also seems to be an issue for eigenvectors with the same eigenvalue.

sfcheung commented 2 years ago

For the tests, solved by using a pregenerated dataset (80cc671a6e9c6d8d01902bcfda27ead42a4a2edf).