Closed hynekcigler closed 1 year ago
Definitely a confirmed bug related to the presence of completely missing rows (correct results are those without the full missing rows). I've sent a patch to the dev version, which your welcome to test out. However, I haven't closed the issue since there were a few objects that were undefined in your script (fit3
, fit3pcm
), so I'll leave it open just in case there's something going on in itemfit()
too (the example with itemfit(fit1...)
does however look to be correct now). Thanks for the report.
Great, thank you! The fit3
and fit3pcm
objects were my fault; I previously had another additional object (with a complete list-wise deletion) for testing purposes, which I deleted but forgot to change these four lines in my script. I have edited my initial script now, sorry for the mistake.
Unfortunately, I cannot test your dev version, as I am not able to install dev version on my PC (probably an error with RcppArmadillo package, with a message g++.exe: error: unrecognized command line option '-std=gnu++17'
) and have to keep the same mirt version on my work laptop right now.
However, if the calls with fit1 and fit2 (and also fit1pcm and fit2pcm) give you the same results, it should be patched successfully.
No problem. Here's what I get from reprex()
when using the dev version now.
M2dat <- read.csv("https://www.dropbox.com/scl/fi/suuy9w1qepl9vx7pv9y90/M2problem.csv?rlkey=qg18985ao3f29zj740te8usbm&raw=1")
M2dat <- M2dat[-1]
M2dat2 <- M2dat[rowSums(is.na(M2dat)) < 6, ] ## only non-empty rows
library(mirt)
#> Loading required package: stats4
#> Loading required package: lattice
sessionInfo()
#> R version 4.2.2 (2022-10-31 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 22621)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=English_Canada.utf8 LC_CTYPE=English_Canada.utf8
#> [3] LC_MONETARY=English_Canada.utf8 LC_NUMERIC=C
#> [5] LC_TIME=English_Canada.utf8
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] mirt_1.40.2 lattice_0.20-45
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_1.0.11 compiler_4.2.2 dcurver_0.9.2
#> [4] R.methodsS3_1.8.2 R.utils_2.12.2 tools_4.2.2
#> [7] digest_0.6.33 gtable_0.3.4 evaluate_0.21
#> [10] lifecycle_1.0.3 nlme_3.1-163 R.cache_0.16.0
#> [13] mgcv_1.9-0 rlang_1.1.1 reprex_2.0.2
#> [16] Matrix_1.6-1 cli_3.6.1 rstudioapi_0.15.0
#> [19] yaml_2.3.7 parallel_4.2.2 xfun_0.40
#> [22] fastmap_1.1.1 gridExtra_2.3 cluster_2.1.4
#> [25] withr_2.5.0 styler_1.10.2 knitr_1.43
#> [28] fs_1.6.3 vctrs_0.6.3 grid_4.2.2
#> [31] glue_1.6.2 pbapply_1.7-2 rmarkdown_2.24
#> [34] GPArotation_2023.8-1 purrr_1.0.2 magrittr_2.0.3
#> [37] MASS_7.3-60 htmltools_0.5.6 splines_4.2.2
#> [40] permute_0.9-7 Deriv_4.1.3 vegan_2.6-4
#> [43] R.oo_1.25.0
## Fit the models
fit1 <- mirt(M2dat, 1, itemtype = "gpcm", SE = T) ## model with missing data
#> Warning: data contains response patterns with only NAs
#> Iteration: 1, Log-Lik: -2607.249, Max-Change: 2.41523Iteration: 2, Log-Lik: -2423.933, Max-Change: 0.32486Iteration: 3, Log-Lik: -2418.795, Max-Change: 0.14952Iteration: 4, Log-Lik: -2417.405, Max-Change: 0.10601Iteration: 5, Log-Lik: -2416.817, Max-Change: 0.09550Iteration: 6, Log-Lik: -2416.599, Max-Change: 0.03412Iteration: 7, Log-Lik: -2416.551, Max-Change: 0.02747Iteration: 8, Log-Lik: -2416.491, Max-Change: 0.01370Iteration: 9, Log-Lik: -2416.463, Max-Change: 0.01094Iteration: 10, Log-Lik: -2416.440, Max-Change: 0.00589Iteration: 11, Log-Lik: -2416.439, Max-Change: 0.00143Iteration: 12, Log-Lik: -2416.438, Max-Change: 0.00029Iteration: 13, Log-Lik: -2416.438, Max-Change: 0.00026Iteration: 14, Log-Lik: -2416.438, Max-Change: 0.00015Iteration: 15, Log-Lik: -2416.438, Max-Change: 0.00010Iteration: 16, Log-Lik: -2416.438, Max-Change: 0.00009
#>
#> Calculating information matrix...
fit2 <- mirt(M2dat2, 1, itemtype = "gpcm", SE = T) ## model without empty cases
#> Iteration: 1, Log-Lik: -2607.249, Max-Change: 2.41523Iteration: 2, Log-Lik: -2423.933, Max-Change: 0.32486Iteration: 3, Log-Lik: -2418.795, Max-Change: 0.14952Iteration: 4, Log-Lik: -2417.405, Max-Change: 0.10601Iteration: 5, Log-Lik: -2416.817, Max-Change: 0.09550Iteration: 6, Log-Lik: -2416.599, Max-Change: 0.03412Iteration: 7, Log-Lik: -2416.551, Max-Change: 0.02747Iteration: 8, Log-Lik: -2416.491, Max-Change: 0.01370Iteration: 9, Log-Lik: -2416.463, Max-Change: 0.01094Iteration: 10, Log-Lik: -2416.440, Max-Change: 0.00589Iteration: 11, Log-Lik: -2416.439, Max-Change: 0.00143Iteration: 12, Log-Lik: -2416.438, Max-Change: 0.00029Iteration: 13, Log-Lik: -2416.438, Max-Change: 0.00026Iteration: 14, Log-Lik: -2416.438, Max-Change: 0.00015Iteration: 15, Log-Lik: -2416.438, Max-Change: 0.00010Iteration: 16, Log-Lik: -2416.438, Max-Change: 0.00009
#>
#> Calculating information matrix...
fit1pcm <- mirt(M2dat, 1, itemtype = "Rasch", SE = T) ## model with missing data
#> Warning: data contains response patterns with only NAs
#> Iteration: 1, Log-Lik: -2614.066, Max-Change: 2.28460Iteration: 2, Log-Lik: -2430.631, Max-Change: 0.43480Iteration: 3, Log-Lik: -2427.815, Max-Change: 0.16245Iteration: 4, Log-Lik: -2427.421, Max-Change: 0.36219Iteration: 5, Log-Lik: -2426.869, Max-Change: 0.05091Iteration: 6, Log-Lik: -2426.679, Max-Change: 0.04703Iteration: 7, Log-Lik: -2426.529, Max-Change: 0.02352Iteration: 8, Log-Lik: -2426.513, Max-Change: 0.01115Iteration: 9, Log-Lik: -2426.518, Max-Change: 0.01301Iteration: 10, Log-Lik: -2426.533, Max-Change: 0.01999Iteration: 11, Log-Lik: -2426.544, Max-Change: 0.00678Iteration: 12, Log-Lik: -2426.554, Max-Change: 0.00740Iteration: 13, Log-Lik: -2426.567, Max-Change: 0.01530Iteration: 14, Log-Lik: -2426.578, Max-Change: 0.00193Iteration: 15, Log-Lik: -2426.585, Max-Change: 0.00304Iteration: 16, Log-Lik: -2426.591, Max-Change: 0.00133Iteration: 17, Log-Lik: -2426.597, Max-Change: 0.00235Iteration: 18, Log-Lik: -2426.600, Max-Change: 0.00202Iteration: 19, Log-Lik: -2426.604, Max-Change: 0.00480Iteration: 20, Log-Lik: -2426.609, Max-Change: 0.00049Iteration: 21, Log-Lik: -2426.611, Max-Change: 0.00077Iteration: 22, Log-Lik: -2426.613, Max-Change: 0.00031Iteration: 23, Log-Lik: -2426.614, Max-Change: 0.00077Iteration: 24, Log-Lik: -2426.615, Max-Change: 0.00039Iteration: 25, Log-Lik: -2426.616, Max-Change: 0.00059Iteration: 26, Log-Lik: -2426.617, Max-Change: 0.00030Iteration: 27, Log-Lik: -2426.618, Max-Change: 0.00045Iteration: 28, Log-Lik: -2426.618, Max-Change: 0.00012Iteration: 29, Log-Lik: -2426.619, Max-Change: 0.00008
#>
#> Calculating information matrix...
fit2pcm <- mirt(M2dat2, 1, itemtype = "Rasch", SE = T) ## model without empty cases
#> Iteration: 1, Log-Lik: -2614.066, Max-Change: 2.28460Iteration: 2, Log-Lik: -2430.631, Max-Change: 0.43480Iteration: 3, Log-Lik: -2427.815, Max-Change: 0.16245Iteration: 4, Log-Lik: -2427.421, Max-Change: 0.36219Iteration: 5, Log-Lik: -2426.869, Max-Change: 0.05091Iteration: 6, Log-Lik: -2426.679, Max-Change: 0.04703Iteration: 7, Log-Lik: -2426.529, Max-Change: 0.02352Iteration: 8, Log-Lik: -2426.513, Max-Change: 0.01115Iteration: 9, Log-Lik: -2426.518, Max-Change: 0.01301Iteration: 10, Log-Lik: -2426.533, Max-Change: 0.01999Iteration: 11, Log-Lik: -2426.544, Max-Change: 0.00678Iteration: 12, Log-Lik: -2426.554, Max-Change: 0.00740Iteration: 13, Log-Lik: -2426.567, Max-Change: 0.01530Iteration: 14, Log-Lik: -2426.578, Max-Change: 0.00193Iteration: 15, Log-Lik: -2426.585, Max-Change: 0.00304Iteration: 16, Log-Lik: -2426.591, Max-Change: 0.00133Iteration: 17, Log-Lik: -2426.597, Max-Change: 0.00235Iteration: 18, Log-Lik: -2426.600, Max-Change: 0.00202Iteration: 19, Log-Lik: -2426.604, Max-Change: 0.00480Iteration: 20, Log-Lik: -2426.609, Max-Change: 0.00049Iteration: 21, Log-Lik: -2426.611, Max-Change: 0.00077Iteration: 22, Log-Lik: -2426.613, Max-Change: 0.00031Iteration: 23, Log-Lik: -2426.614, Max-Change: 0.00077Iteration: 24, Log-Lik: -2426.615, Max-Change: 0.00039Iteration: 25, Log-Lik: -2426.616, Max-Change: 0.00059Iteration: 26, Log-Lik: -2426.617, Max-Change: 0.00030Iteration: 27, Log-Lik: -2426.618, Max-Change: 0.00045Iteration: 28, Log-Lik: -2426.618, Max-Change: 0.00012Iteration: 29, Log-Lik: -2426.619, Max-Change: 0.00008
#>
#> Calculating information matrix...
## Models are exactly the same
anova(fit1, fit2)
#> AIC SABIC HQ BIC logLik X2 df p
#> fit1 4880.877 4904.724 4920.203 4980.897 -2416.438
#> fit2 4880.877 4904.724 4920.203 4980.897 -2416.438 0 0 NaN
anova(fit1pcm, fit2pcm)
#> AIC SABIC HQ BIC logLik X2 df p
#> fit1pcm 4891.238 4910.117 4922.371 4970.421 -2426.619
#> fit2pcm 4891.238 4910.117 4922.371 4970.421 -2426.619 0 0 NaN
## C2 of fit1 and fit3 should be the same, but they are not
M2(fit1, na.rm = T, type="C2")
#> Sample size after row-wise response data removal: 436
#> M2 df p RMSEA RMSEA_5 RMSEA_95 SRMSR TLI CFI
#> stats 7.388424 9 0.5967474 0 0 0.04676177 0.03959729 1.008525 1
M2(fit2, na.rm = T, type="C2")
#> Sample size after row-wise response data removal: 436
#> M2 df p RMSEA RMSEA_5 RMSEA_95 SRMSR TLI CFI
#> stats 7.388424 9 0.5967474 0 0 0.04676177 0.03959729 1.008525 1
## M2 works well
M2(fit1pcm, na.rm = T, type="M2*")
#> Sample size after row-wise response data removal: 436
#> M2 df p RMSEA RMSEA_5 RMSEA_95 SRMSR TLI
#> stats 3.732157 2 0.1547292 0.04462046 0 0.1143858 0.0732038 0.8908063
#> CFI
#> stats 0.9272042
M2(fit2pcm, na.rm = T, type="M2*")
#> Sample size after row-wise response data removal: 436
#> M2 df p RMSEA RMSEA_5 RMSEA_95 SRMSR TLI
#> stats 3.732157 2 0.1547292 0.04462046 0 0.1143858 0.0732038 0.8908063
#> CFI
#> stats 0.9272042
## Empirical ICC for item 4
itemfit(fit1, na.rm = F, empirical.plot = 4, empirical.poly.collapse = T) ## seems to be ok, na.rm=F
itemfit(fit1, na.rm = T, empirical.plot = 4, empirical.poly.collapse = T) ## clear misfit.
#> Sample size after row-wise response data removal: 436
#itemfit(fit3, na.rm = F, empirical.plot = 4, empirical.poly.collapse = T) ## again ok...
#itemfit(fit3, na.rm = T, empirical.plot = 4, empirical.poly.collapse = T) ## ... but this is again too, despite it has to be same as with fit1
## Here, the problems remains despite the PCM model
itemfit(fit1pcm, na.rm = F, empirical.plot = 4, empirical.poly.collapse = T) ## seems to be ok, na.rm=F
itemfit(fit1pcm, na.rm = T, empirical.plot = 4, empirical.poly.collapse = T) ## clear misfit.
#> Sample size after row-wise response data removal: 436
#itemfit(fit3pcm, na.rm = F, empirical.plot = 4, empirical.poly.collapse = T) ## again ok...
#itemfit(fit3pcm, na.rm = T, empirical.plot = 4, empirical.poly.collapse = T) ## ... but this is again too, despite it has to be same as with fit1
Ȁ
#> Error in eval(expr, envir, enclos): object 'Ȁ' not found
Created on 2023-09-13 with reprex v2.0.2
I confirm, the result is correct. I think it's solved. Thank you!
Dear Phil, I discovered a bug related to completely empty rows with M2 and itemfit function. I am currently using mirt version 1.38.1, being afraid to update as I am working on a huge project. Therefore, I hope this bug is still actual as I don't see it in news.
If I fit models with completely empty rows and GPCM item types, M2 function with
type = "C2"
and itemfit functions provide nonsense results. Here is an example of the code:Despite models fit1 and fit2 are exactly the same, empirical plots differs and also C2 statistics differs. I suppose these functions provides internal factor score estimates also for completely missing data. I fitted also PCM models fit1pcm and fit2pcm. Here you can see M2 statitics work well, but C2 is still biased and the same with empirical ICCs. You may check, but I am pretty sure this is not the case with binary items and/or M2 or M2 statistics, but haven't tested it so far.