philchalmers / mirt

Multidimensional item response theory
https://philchalmers.github.io/mirt/
201 stars 75 forks source link

Unclear error in `personfit` when nrow() of original data != nrow of provided Theta #246

Closed netique closed 10 months ago

netique commented 10 months ago

Hello,

by mistake when I was subsetting my matrix of estimated thetas, I stumbled upon these error messages that seemed a bit mysterious. I do not expect the last two lines of the reprex below to work because of itemtrace * fulldata conformability, but maybe it would be nice to stress this in the documentation or in the error message. What do you think?

Best, Jan

library(mirt)
#> Loading required package: stats4
#> Loading required package: lattice

mod <- mirt(Science)
#> Iteration: 1, Log-Lik: -1629.361, Max-Change: 0.50660Iteration: 2, Log-Lik: -1617.374, Max-Change: 0.25442Iteration: 3, Log-Lik: -1612.894, Max-Change: 0.16991Iteration: 4, Log-Lik: -1610.306, Max-Change: 0.10461Iteration: 5, Log-Lik: -1609.814, Max-Change: 0.09162Iteration: 6, Log-Lik: -1609.534, Max-Change: 0.07363Iteration: 7, Log-Lik: -1609.030, Max-Change: 0.03677Iteration: 8, Log-Lik: -1608.988, Max-Change: 0.03200Iteration: 9, Log-Lik: -1608.958, Max-Change: 0.02754Iteration: 10, Log-Lik: -1608.878, Max-Change: 0.01443Iteration: 11, Log-Lik: -1608.875, Max-Change: 0.00847Iteration: 12, Log-Lik: -1608.873, Max-Change: 0.00515Iteration: 13, Log-Lik: -1608.872, Max-Change: 0.00550Iteration: 14, Log-Lik: -1608.872, Max-Change: 0.00318Iteration: 15, Log-Lik: -1608.871, Max-Change: 0.00462Iteration: 16, Log-Lik: -1608.871, Max-Change: 0.00277Iteration: 17, Log-Lik: -1608.870, Max-Change: 0.00145Iteration: 18, Log-Lik: -1608.870, Max-Change: 0.00175Iteration: 19, Log-Lik: -1608.870, Max-Change: 0.00126Iteration: 20, Log-Lik: -1608.870, Max-Change: 0.00025Iteration: 21, Log-Lik: -1608.870, Max-Change: 0.00285Iteration: 22, Log-Lik: -1608.870, Max-Change: 0.00108Iteration: 23, Log-Lik: -1608.870, Max-Change: 0.00022Iteration: 24, Log-Lik: -1608.870, Max-Change: 0.00059Iteration: 25, Log-Lik: -1608.870, Max-Change: 0.00014Iteration: 26, Log-Lik: -1608.870, Max-Change: 0.00068Iteration: 27, Log-Lik: -1608.870, Max-Change: 0.00065Iteration: 28, Log-Lik: -1608.870, Max-Change: 0.00019Iteration: 29, Log-Lik: -1608.870, Max-Change: 0.00061Iteration: 30, Log-Lik: -1608.870, Max-Change: 0.00012Iteration: 31, Log-Lik: -1608.870, Max-Change: 0.00012Iteration: 32, Log-Lik: -1608.870, Max-Change: 0.00058Iteration: 33, Log-Lik: -1608.870, Max-Change: 0.00055Iteration: 34, Log-Lik: -1608.870, Max-Change: 0.00015Iteration: 35, Log-Lik: -1608.870, Max-Change: 0.00052Iteration: 36, Log-Lik: -1608.870, Max-Change: 0.00010

thetas <- fscores(mod)

pfit <- personfit(mod, Theta = thetas)
pfit_subset <- personfit(mod, Theta = thetas[1:100, ])
#> Error in if (nrow(fulldata) < nrow(Theta)) Theta <- Theta[extract.mirt(x, : argument is of length zero
pfit_subset_nodrop <- personfit(mod, Theta = thetas[1:100, , drop = FALSE])
#> Error in `[<-`(`*tmp*`, missing_loc[, i], itemloc[i]:(itemloc[i + 1L] - : (subscript) logical subscript too long

Created on 2023-11-13 with reprex v2.0.2

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.3.2 (2023-10-31) #> os macOS Sonoma 14.1.1 #> system aarch64, darwin20 #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz Europe/Prague #> date 2023-11-13 #> pandoc 3.1.1 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> cli 3.6.1 2023-03-23 [1] CRAN (R 4.3.1) #> cluster 2.1.4 2022-08-22 [2] CRAN (R 4.3.2) #> dcurver 0.9.2 2020-11-04 [1] CRAN (R 4.3.0) #> Deriv 4.1.3 2021-02-24 [1] CRAN (R 4.3.0) #> digest 0.6.33 2023-07-07 [1] CRAN (R 4.3.0) #> evaluate 0.22 2023-09-29 [1] CRAN (R 4.3.1) #> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.0) #> fs 1.6.3 2023-07-20 [1] CRAN (R 4.3.0) #> glue 1.6.2 2022-02-24 [1] CRAN (R 4.3.0) #> GPArotation 2023.8-1 2023-08-21 [1] CRAN (R 4.3.0) #> gridExtra 2.3 2017-09-09 [1] CRAN (R 4.3.0) #> gtable 0.3.4 2023-08-21 [1] CRAN (R 4.3.0) #> htmltools 0.5.6.1 2023-10-06 [1] CRAN (R 4.3.1) #> knitr 1.44 2023-09-11 [1] CRAN (R 4.3.0) #> lattice * 0.21-9 2023-10-01 [2] CRAN (R 4.3.2) #> lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.3.2) #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.0) #> MASS 7.3-60 2023-05-04 [1] CRAN (R 4.3.0) #> Matrix 1.6-1.1 2023-09-18 [1] CRAN (R 4.3.1) #> mgcv 1.9-0 2023-07-11 [2] CRAN (R 4.3.2) #> mirt * 1.41.1 2023-11-12 [1] Github (philchalmers/mirt@d4b4491) #> nlme 3.1-163 2023-08-09 [2] CRAN (R 4.3.2) #> pbapply 1.7-2 2023-06-27 [1] CRAN (R 4.3.0) #> permute 0.9-7 2022-01-27 [1] CRAN (R 4.3.0) #> purrr 1.0.2 2023-08-10 [1] CRAN (R 4.3.0) #> R.cache 0.16.0 2022-07-21 [1] CRAN (R 4.3.0) #> R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.3.0) #> R.oo 1.25.0 2022-06-12 [1] CRAN (R 4.3.0) #> R.utils 2.12.2 2022-11-11 [1] CRAN (R 4.3.0) #> Rcpp 1.0.11 2023-07-06 [1] CRAN (R 4.3.0) #> reprex 2.0.2 2022-08-17 [1] CRAN (R 4.3.0) #> rlang 1.1.1 2023-04-28 [1] CRAN (R 4.3.0) #> rmarkdown 2.25 2023-09-18 [1] CRAN (R 4.3.1) #> rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.0) #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.0) #> styler 1.10.2 2023-08-29 [1] CRAN (R 4.3.0) #> vctrs 0.6.4 2023-10-12 [1] CRAN (R 4.3.1) #> vegan 2.6-4 2022-10-11 [1] CRAN (R 4.3.0) #> withr 2.5.1 2023-09-26 [1] CRAN (R 4.3.1) #> xfun 0.40 2023-08-09 [1] CRAN (R 4.3.0) #> yaml 2.3.7 2023-01-23 [1] CRAN (R 4.3.0) #> #> [1] /Users/netik/Library/R/arm64/4.3/library #> [2] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library #> #> ────────────────────────────────────────────────────────────────────────────── ```
philchalmers commented 10 months ago

I'm sure there's a bunch of error messages that could be thrown to deter such behaviour throughout the package, some of which are harder to anticipate. This one is kind of clear though; every response vector needs an associated $\hat{\theta}$, so supplying a matrix of Theta values that is less than the original sample size should throw errors (as this does, though in a rather ugly way). The first error message could have been inferred from the documentation though as Theta was advertised as a matrix, though again this could be checked earlier in the code too.