philchalmers / mirt

Multidimensional item response theory
https://philchalmers.github.io/mirt/
199 stars 75 forks source link

Sample code for the fixedCalib() #200

Closed czopluoglu closed 3 years ago

czopluoglu commented 3 years ago

Hi Phil,

I had to recently use the fixedCalib() function in mirt. Thanks for making it available. It was very much needed.

I think your sample code should be updated to avoid confusion. In your current sample code in the mirt manual, it seems you remove the data for the pre-calibrated items for the new sample (dataset2). It works fine in your case because you use N(0,1) for both old and new datasets when simulating them. If I simulate the new sample dataset from a different mean and sd, e.g., N(1,1.5), removing the data for the pre-calibrated items from the new sample dataset produced biased estimates. They turn out as if this is an independent calibration. Including this data in the estimation of new sample data avoids that issue. So, I think it may avoid any confusion and misuse if the sample code is modified something like the following.

I may be wrong, but I tried a few times, and it is my experience.

set.seed(12345)
J <- 50
a <- matrix(abs(rnorm(J,1,.3)), ncol=1)
d <- matrix(rnorm(J,0,.7),ncol=1)
itemtype <- rep('2PL', nrow(a))

# calibration data theta ~ N(0,1)

dataset1 <- simdata(a, d,itemtype=itemtype,Theta=as.matrix(rnorm(3000,0,1)))

# new data (theta ~ N(1,1.5))

dataset2 <- simdata(a, d, itemtype=itemtype, Theta = as.matrix(rnorm(1000,1,1.5)))

# last 40% of experimental items not given to calibration group
# (unobserved; hence removed)

dataset1 <- dataset1[,-c(J:(J*.6))]

# Don't remove the pre-calibrated items from the new sample data
# They should be included in the estimation

mod <- mirt(dataset1, model = 1)
coef(mod, simplify=TRUE)

# Multiple Prior Weights Updating and Multiple EM Cycles (MWU-MEM)

MWU_MEM <- fixedCalib(dataset2, model = 1, old_mod = mod)
coef(MWU_MEM, simplify=TRUE)
data.frame(coef(MWU_MEM, simplify=TRUE)$items[,c('a1','d')], pop_a1=a, pop_d=d)
plot(MWU_MEM, type = 'empiricalhist')
philchalmers commented 3 years ago

Hi Cengiz,

Thanks for the report, and for trying out the new function, though I don't quite follow what the issue is here in that the old-data is not discarded at all in the default MWU_MEM algorithm. You can see that it uses the fulldata object on this line https://github.com/philchalmers/mirt/blob/aa489e30b807c6a6ac0e10c6c5b75097753a7b61/R/fixedCalib.R#L160, so the mirt() model is using both the old and new data when obtaining the new estimates.

In any event, this likely indicates a bad calibration situation when using fixed-item calibration since the new population is wildly different than the calibrated one, and probably should be inspected in terms of a multiple-group analysis instead (clearly distributional impact). At least, that's what Kim (2006) reported in this study that this function was based off of. Does that help explain the unusual output?

czopluoglu commented 3 years ago

Hi Phil,

Sorry for not communicating well. The issue is the example code provided on the help page. Please see below. This is copy/paste from the help page for the fixedCalib() function in the most recent mirt manual on CRAN.

## Not run:
# single factor
set.seed(12345)
J <- 50
a <- matrix(abs(rnorm(J,1,.3)), ncol=1)
d <- matrix(rnorm(J,0,.7),ncol=1)
itemtype <- rep('2PL', nrow(a))
# calibration data theta ~ N(0,1)
N <- 3000
dataset1 <- simdata(a, d, N = N, itemtype=itemtype)
# new data (again, theta ~ N(0,1))
dataset2 <- simdata(a, d, N = 1000, itemtype=itemtype)
# last 40% of experimental items not given to calibration group
# (unobserved; hence removed)
dataset1 <- dataset1[,-c(J:(J*.6))]
head(dataset1)
# assume first 60% of items not given to new group
dataset2[,colnames(dataset1)] <- NA
head(dataset2)

At the end of the code above, you are removing the data for the pre-calibrated items from dataset2 (a new sample). Then, you provide dataset2 without the pre-calibrated items later as input to the function.

MWU_MEM <- fixedCalib(dataset2, model = 1, old_mod = mod)

This is like independent calibration and provides biased estimates.

What I was trying to say is that we should not remove the pre-calibrated items from dataset2, there have to be some common items to link the scales in the new sample dataset. So, I think the following lines in your example code are not necessary and should be removed. dataset2 input to the fixedCalib() function should include all data including the data for the pre-calibrated items from the new sample.

# assume first 60% of items not given to new group
dataset2[,colnames(dataset1)] <- NA
head(dataset2)

In your example code, this is not a big deal because you simulate both old and new data using N(0,1). So, the estimates are fine without the pre-calibrated items because the two groups are similar.

However, if you run the example code by changing the ability distribution for the new group as N(1,1.5), for instance, the example code on your help page produces biased estimates. Your d-parameters for instance are increased by about 1 point for the new items. See below.

set.seed(12345)
J <- 50
a <- matrix(abs(rnorm(J,1,.3)), ncol=1)
d <- matrix(rnorm(J,0,.7),ncol=1)
itemtype <- rep('2PL', nrow(a))

# calibration data theta ~ N(0,1)

dataset1 <- simdata(a, d,itemtype=itemtype,Theta=as.matrix(rnorm(3000,0,1)))

# new data (theta ~ N(1,1.5))

dataset2 <- simdata(a, d, itemtype=itemtype, Theta = as.matrix(rnorm(1000,1,1.5)))

# last 40% of experimental items not given to calibration group
# (unobserved; hence removed)

dataset1 <- dataset1[,-c(J:(J*.6))]

# assume first 60% of items not given to new group
dataset2[,colnames(dataset1)] <- NA

mod <- mirt(dataset1, model = 1)
coef(mod, simplify=TRUE)

# Multiple Prior Weights Updating and Multiple EM Cycles (MWU-MEM)

MWU_MEM <- fixedCalib(dataset2, model = 1, old_mod = mod)
data.frame(coef(MWU_MEM, simplify=TRUE)$items[,c('a1','d')], pop_a1=a, pop_d=d)

               a1           d    pop_a1       pop_d
Item_1  1.1961890 -0.36582599 1.1756586 -0.37827025
Item_2  1.2180200  1.34513036 1.2128398  1.36338487
Item_3  1.0287076 -0.01749165 0.9672090  0.03751319
Item_4  0.8169838  0.22955350 0.8639508  0.24616399
Item_5  1.1477420 -0.46213650 1.1817662 -0.46968358
Item_6  0.4734409  0.15748181 0.4546132  0.19456759
Item_7  1.1723767  0.49084966 1.1890296  0.48381989
Item_8  0.8823964  0.59861143 0.9171448  0.57665673
Item_9  0.7320060  1.36669674 0.9147521  1.50154551
Item_10 0.7374958 -1.70288028 0.7242034 -1.64286078
Item_11 0.9877230  0.05684034 0.9651257  0.10471439
Item_12 1.6208445 -0.88830542 1.5451936 -0.93977204
Item_13 1.2113938  0.36247319 1.1111884  0.38731215
Item_14 1.2608594  1.10068854 1.1560649  1.11297399
Item_15 0.7483149 -0.31912299 0.7748404 -0.41081572
Item_16 1.1708992 -1.24683338 1.2450700 -1.28266411
Item_17 0.7422645  0.63674114 0.7340927  0.62169760
Item_18 0.9248296  1.12848731 0.9005267  1.11544193
Item_19 1.4767582  0.39539515 1.3362138  0.36179827
Item_20 1.0368814 -0.99330414 1.0896171 -0.90697018
Item_21 1.1357543  0.06752098 1.2338866  0.03823090
Item_22 1.4021119 -0.51282038 1.4367355 -0.54925456
Item_23 0.6786602 -0.68438097 0.8067015 -0.73454697
Item_24 0.4828002  1.68992778 0.5340588  1.63135837
Item_25 0.5011412  1.04812815 0.5206871  0.98189377
Item_26 1.6039924  0.63625385 1.5415293  0.65982060
Item_27 0.8328685  0.55938549 0.8555058  0.57838080
Item_28 1.1001393 -0.49299690 1.1861139 -0.56807834
Item_29 1.1996287  0.35417420 1.1836370  0.33337380
Item_30 1.5200340  1.85724739 0.9513067  0.71488089
Item_31 1.8078888  1.70729707 1.2435620  0.45176815
Item_32 2.4375267  2.64819557 1.6590501  0.73020049
Item_33 2.2986334  1.59162595 1.6147571 -0.21305838
Item_34 2.1651815  3.21818310 1.4897337  1.73397764
Item_35 1.3867267  1.75218630 1.0762814  0.67985447
Item_36 1.5204315  2.48563358 1.1473565  1.30696943
Item_37 1.1744869  1.39428643 0.9027740  0.47042973
Item_38 0.9265040  0.21581371 0.5013849 -0.21556737
Item_39 2.2627319  1.96012197 1.5303202  0.37556660
Item_40 1.6321151  1.85524689 1.0077403  0.57740905
Item_41 1.8164580  0.75312923 1.3385533 -0.67473104
Item_42 0.3682513 -0.26137133 0.2858926 -0.59855776
Item_43 1.1010880  2.12200708 0.6819203  1.32086286
Item_44 1.9238829  1.12528676 1.2811422 -0.27427356
Item_45 1.5615960  0.48165387 1.2563355 -0.68644306
Item_46 2.1254314  1.91967460 1.4382188  0.48113247
Item_47 0.8343769  0.27575483 0.5760704 -0.35353046
Item_48 1.8830552  2.84104465 1.1702210  1.51040387
Item_49 1.8911639  0.88518795 1.1749563 -0.41985829
Item_50 0.8895868  0.21840283 0.6079603 -0.48618269

However, if you don't remove the pre-calibrated items from the new sample, then the estimation is just fine. See below the same code by removing those lines from the above code.

set.seed(12345)
J <- 50
a <- matrix(abs(rnorm(J,1,.3)), ncol=1)
d <- matrix(rnorm(J,0,.7),ncol=1)
itemtype <- rep('2PL', nrow(a))

# calibration data theta ~ N(0,1)

dataset1 <- simdata(a, d,itemtype=itemtype,Theta=as.matrix(rnorm(3000,0,1)))

# new data (theta ~ N(1,1.5))

dataset2 <- simdata(a, d, itemtype=itemtype, Theta = as.matrix(rnorm(1000,1,1.5)))

# last 40% of experimental items not given to calibration group
# (unobserved; hence removed)

dataset1 <- dataset1[,-c(J:(J*.6))]

mod <- mirt(dataset1, model = 1)
coef(mod, simplify=TRUE)

# Multiple Prior Weights Updating and Multiple EM Cycles (MWU-MEM)

MWU_MEM <- fixedCalib(dataset2, model = 1, old_mod = mod)
data.frame(coef(MWU_MEM, simplify=TRUE)$items[,c('a1','d')], pop_a1=a, pop_d=d)

              a1           d    pop_a1       pop_d
Item_1  1.1961890 -0.36582599 1.1756586 -0.37827025
Item_2  1.2180200  1.34513036 1.2128398  1.36338487
Item_3  1.0287076 -0.01749165 0.9672090  0.03751319
Item_4  0.8169838  0.22955350 0.8639508  0.24616399
Item_5  1.1477420 -0.46213650 1.1817662 -0.46968358
Item_6  0.4734409  0.15748181 0.4546132  0.19456759
Item_7  1.1723767  0.49084966 1.1890296  0.48381989
Item_8  0.8823964  0.59861143 0.9171448  0.57665673
Item_9  0.7320060  1.36669674 0.9147521  1.50154551
Item_10 0.7374958 -1.70288028 0.7242034 -1.64286078
Item_11 0.9877230  0.05684034 0.9651257  0.10471439
Item_12 1.6208445 -0.88830542 1.5451936 -0.93977204
Item_13 1.2113938  0.36247319 1.1111884  0.38731215
Item_14 1.2608594  1.10068854 1.1560649  1.11297399
Item_15 0.7483149 -0.31912299 0.7748404 -0.41081572
Item_16 1.1708992 -1.24683338 1.2450700 -1.28266411
Item_17 0.7422645  0.63674114 0.7340927  0.62169760
Item_18 0.9248296  1.12848731 0.9005267  1.11544193
Item_19 1.4767582  0.39539515 1.3362138  0.36179827
Item_20 1.0368814 -0.99330414 1.0896171 -0.90697018
Item_21 1.1357543  0.06752098 1.2338866  0.03823090
Item_22 1.4021119 -0.51282038 1.4367355 -0.54925456
Item_23 0.6786602 -0.68438097 0.8067015 -0.73454697
Item_24 0.4828002  1.68992778 0.5340588  1.63135837
Item_25 0.5011412  1.04812815 0.5206871  0.98189377
Item_26 1.6039924  0.63625385 1.5415293  0.65982060
Item_27 0.8328685  0.55938549 0.8555058  0.57838080
Item_28 1.1001393 -0.49299690 1.1861139 -0.56807834
Item_29 1.1996287  0.35417420 1.1836370  0.33337380
Item_30 1.1243474  0.82967436 0.9513067  0.71488089
Item_31 1.3199209  0.49270276 1.2435620  0.45176815
Item_32 1.8287791  1.01313448 1.6590501  0.73020049
Item_33 1.7144799  0.04709175 1.6147571 -0.21305838
Item_34 1.5832051  1.75263897 1.4897337  1.73397764
Item_35 1.0561923  0.80763936 1.0762814  0.67985447
Item_36 1.1348785  1.45958503 1.1473565  1.30696943
Item_37 0.8893722  0.58732001 0.9027740  0.47042973
Item_38 0.6609398 -0.41046199 0.5013849 -0.21556737
Item_39 1.5994091  0.45281708 1.5303202  0.37556660
Item_40 1.1216407  0.76567084 1.0077403  0.57740905
Item_41 1.2649303 -0.43097664 1.3385533 -0.67473104
Item_42 0.2721145 -0.52431542 0.2858926 -0.59855776
Item_43 0.7683443  1.37829594 0.6819203  1.32086286
Item_44 1.4030868 -0.16266509 1.2811422 -0.27427356
Item_45 1.2117236 -0.62384842 1.2563355 -0.68644306
Item_46 1.5275409  0.49861371 1.4382188  0.48113247
Item_47 0.5587936 -0.25920072 0.5760704 -0.35353046
Item_48 1.4404876  1.58233134 1.1702210  1.51040387
Item_49 1.2938777 -0.33631827 1.1749563 -0.41985829
Item_50 0.6412528 -0.38919806 0.6079603 -0.48618269

The issue is just about improving the example code on the help page. If students or people who are not familiar with these procedures follow the current example code, they may remove the pre-calibrated data from the new sample (as is done in the help page), and this will yield biased parameter estimates without them being aware of it. Dataset2 should include data for a few common items (technically at least one) to obtain the parameters on the same scale. Just a cautionary step.

Hope this clarifies the issue.

Thank you.

philchalmers commented 3 years ago

Hi Cengiz,

Thank you very much for clarifying this issue. In principle, I completely agree with you, in that I agree the examples around to reflect that a more optimal item design strategy should be used to include overlapping item responses (aka common items shared across participants). I'll be sure to patch this up and include more interesting setups as well, where participants who received the experimental items may not have received all of the pre-calibrated items as well (e.g., all participants saw a test with 50 items, but were given 40 random pre-calibrated items and 10 random experimental items). However, I'm tempted to leave the current example in and just label the setup as 'extreme' or something, since it's interesting that it has the potential to work at all!

Part of my confusion in the inspiration for the setup came from Kim's 2006 paper, who didn't actually mention 'common' items shared between the old and new respondents (I suppose the default was that all the items from the pre-calibrated test were also administered to the individuals who also took the experimental items?). New participants also taking all of the pre-calibrated items, as well as the experimental items, does seem like a reasonable setup to me empirically, though it's rather interesting that this idea can be taken to the limit where none of the items necessary have to be shared (and will work so long as the latent trait distributions match). I don't quite agree with your point that this is equivalent to independent calibrations though, as the posterior distribution of the theta distribution from the pre-calibrated data is still used as the prior of individuals use for the to-be-calibrated items, but I do agree that without common items to link the scales better noticeable bias is much more possible and indeed expected.

In any event, thanks again. I'll push an update to the examples that will close this issue shortly. Cheers.

czopluoglu commented 3 years ago

Sounds great! Thanks for all your work. Whenever I feel I am in trouble and not sure how to implement something, I find it right there somewhere in the mirt manual. Kim(2006) paper was one of them. I really appreciate you included this function in the package.