philchalmers / mirt

Multidimensional item response theory
https://philchalmers.github.io/mirt/
201 stars 75 forks source link

Factor scores from mixedmirt model shorter than length of input variables #168

Closed isaactpetersen closed 4 years ago

isaactpetersen commented 5 years ago

Love the package, Phil. Thanks for all your hard work on it. I'm trying to obtain factor scores from a mixedmirt() object. I read here (https://github.com/philchalmers/mirt/issues/101) to use the randef() function instead of fscores() to obtain factor scores from a mixedmirt() object. It appears I can obtain factor scores from my mixedmirt() object using randef(mixedModelName)$Theta. Is that correct? However, I'm trying to merge that Theta vector with the original input data.frame, but the length of that vector (173) is shorter than the length of the input variables (198). When I try to merge the factor scores with the original data.frame, I get the following error:

Error in `$<-.data.frame`(`*tmp*`, mixedModelName, value = c(1.07338604449888, : replacement has 173 rows, data has 198

How can I know which factor score corresponds to which row of the input data (due to their different lengths)? Would it be possible to pad the factor scores with NAs for the relevant rows to make them the same lengths (similar to how lm peforms with na.exclude: https://stats.stackexchange.com/a/11028)?

Many thanks!

philchalmers commented 5 years ago

You're correct that the randef(mixedModelName)$Theta is the correct element to extract, but it's surprising to me that the lengths are different. This implies that rows are being extracted somehow, which is generally against the package's philosophy on managing data (though it's possible to do things like this with explicitly arguments like technical = list(removeEmptyRows = TRUE). Can you provide a reproducible example of this issue?

isaactpetersen commented 5 years ago

Great, thanks for clarifying. I'm attaching a reproducible example. Thanks very much for looking into this. There's a great deal of missingness, so that could have something to do with it. Reproducible example.zip

isaactpetersen commented 5 years ago

This one might work better for you--I noticed some missing syntax from the last one. Sorry about that. Reproducible_example.zip

philchalmers commented 4 years ago

Thanks for the reproducible example, that was quite helpful. I originally modeled the behaviour of the covdata object to reflect what the lm() function does when NAs are present, in which all observations are removed row-wise without warning. However, I now realize that while the the behaviour is fine the lack of warning message is not. I've added a sufficient warning now to let the user know their data has been modified.

The fix for your example then is to only save the randef() terms to the elements that were not NA, like so:

#Merge factor scores with input data
covdata <- irtData[,c("tcid","tc_sex","tc_ageCentered")]
pick <- rowSums(is.na(covdata)) == 0
irtData$cbcl_externalizing[pick] <- randef(mixedEffectsModel)$Theta

HTH.

isaactpetersen commented 4 years ago

Great, thanks for your help!