philchalmers / mirt

Multidimensional item response theory
https://philchalmers.github.io/mirt/
199 stars 75 forks source link

Inclusion of other variables in data frame for a MIRT model #204

Closed kephartg closed 3 years ago

kephartg commented 3 years ago

My apologies if this is not a proper place to post this. I am having an issue with how to retain an id variable or any other variable in the input data which you I do not want to include in the mirt model. I am not sure if it is a code issue, documentation issue, or ignorance on my part.

It seems that the data frame can only include the variables used in a mirt model.

For example, I have a full data set from which I extract the id variable plus the variables I need for the model:

resvars <- c("id","res8", "res9", "res11", "res12", "res13", "res15") resdata <- full_data[resvars]

Then I specify a mirt model and run it:

res_mdl <- " RESOURCE = res8,res9,res11,res12,res13,res15" res_mirt_mdl <- mirt.model(res_mdl, itemnames=colnames(resdata)) mod_res <- mirt(resdata, res_mirt_mdl, itemtype='graded'

I would expect mirt to only use the variables I have specified in the model, but it is trying to include the id variable in the model, as can be seen in the console output, which is obviously not what I want:

console output: mod_res <- mirt(resdata, res_mirt_mdl, itemtype='graded',removeEmptyRows=TRUE) "id" re-mapped to ensure all categories have a distance of 1 Iteration: 15, Log-Lik: -14935.432, Max-Change: 0.01863Warning message: The following items have a large number of categories which may cause estimation issues: 1

So, why is this happening, and is there a way to retain an id variable or other variables in the data frame which are not in the model? This is needed, for example, if one wants to estimate person fit estimate and merge them back into a data file. I have also noticed that if I drop a variable from the model, it still includes it in the output with zero values for the discrimination parameter, along with estimates for thresholds.

philchalmers commented 3 years ago

I decided to rework how to deal with response patterns that are completely missing, and now just allow them to be passed as-is with a warning message to the user so that they are aware. This removes the need for a removeEmptyRows logical, which has now been deprecated. This is the new behaviour of how mirt works throughout the package. Hopefully it solves your issue, and thanks for bringing the conceptual ambiguity to my attention.

> library(mirt)
> dat <- Science
> dat[c(1,5,6), ] <- NA
> head(dat)
  Comfort Work Future Benefit
1      NA   NA     NA      NA
2       3    3      3       3
3       3    2      2       3
4       3    2      2       3
5      NA   NA     NA      NA
6      NA   NA     NA      NA
> 
> mod <- mirt(dat, 1)
Iteration: 24, Log-Lik: -1590.189, Max-Change: 0.00008
Warning message:
data contains response patterns with only NAs 
> fs <- fscores(mod)
> head(fs)
              F1
[1,]          NA
[2,]  0.05836796
[3,] -0.87174312
[4,] -0.87174312
[5,]          NA
[6,]          NA