philchalmers / mirt

Multidimensional item response theory
https://philchalmers.github.io/mirt/
199 stars 75 forks source link

mirt parameter estimation accuracy and stability when items are calibrated with 3pl and graded response models #196

Closed Shaojie-tw closed 3 years ago

Shaojie-tw commented 3 years ago

Hi Phil,

Recently, I need to calibrate tests with mixed-format items with mirt package. Before performing it, I need to make sure that conclusions based on it are valid and reliable. Therefore, a simple simulation about the accuracy and stability of mirt estimation is undertaken, and the true values for item parameters are averages from another simulation results. However, I find that the bias, se, and rmse for b1 and b2 are unacceptably large, which are about 800, 18000, and 18000 respectively. I guess something is wrong with my simulation. Do you have any ideas about why it happened? Or how can I improve the performance of mirt when mixed-format items are calibrated simultaneously? The R code, related data, and results are attached. Thanks a lot!

errors of 3plm and grm estimation with mirt.zip

philchalmers commented 3 years ago

Can you provide a single, simple instance of this event? I don't see the need in executing your entire simulation just to replicate the issue. Also, the simulation code looks a little messy from the readers perspective, so you might consider looking into the SimDesign package to help clean things up.

Shaojie-tw commented 3 years ago

Thanks for your advice. During simplifying the R code, I tried another more reasonable set of true values. It turned out that the bias, se, and rmse were within the normal range, for example, less than 0.1. I guess, maybe, the reason for the anomaly in last R code is that all the true values for a, b1 and b2 were extremely similar for each item, causing much larger estimation errors. Thanks again!

philchalmers commented 3 years ago

That sounds unusual, as what you described sounded more like a discrimination parameter being estimated too close to 0 (which causes the difficulty parameters to tend towards infinity), but I'm glad you resolved your issue nonetheless. Cheers.

Shaojie-tw commented 3 years ago

yes, you're right. After checking the estimated values, I find that some estimated discrimination parameters are very close to 0, and even worse, some of them are negative. Therefore, I speculate that this is caused by the abnormal settings of true parameter. In detail, I simulated a 50 items test with 40 dichotomous and 10 polytomous items. All discrimination parameters roughly equal 1.5, difficulty parameter for dichotomous items 0, step difficulty parameters for polytomous items -0.6 and 0.6 respectively, and pseudo-guessing parameter 0.25. By the way, I find the SimDesign package a very useful and powerful tools. And I'm trying to learn how to use it aiding my research. I have shared it with my colleagues. Thanks again!

philchalmers commented 3 years ago

No problem, and thanks for clarifying.

If it helps you get started with SimDesign, I've made a selection of wiki examples to get users started, and to demonstrate some real-world simulation examples. It's located here.

Shaojie-tw commented 3 years ago

Thanks a lot!