philchalmers / mirt

Multidimensional item response theory
https://philchalmers.github.io/mirt/
201 stars 75 forks source link

multipleGroup with difficult design seems to give an incorrect result #180

Closed jessekps closed 4 years ago

jessekps commented 4 years ago

Hi again,

I had an unexpected result in a calibration with mirt in a 2PL. I replicated it with simulated data in 1PL and further simplified it as much as I could but unfortunately it is still not very minimal and takes quite a while to compute.

It is a bit of a nasty design with items in the first group fixed and a relatively weak link. The results are clearly off. Could be a bug or I may have misunderstood the invariance argument, parametrisation or how to fix parameters (without fixed parameters, the results seem better). I am able to get a correct result with CML in dexter (shown) or MML using TAM (not shown, can add it if you like).

library(dplyr)
library(dexter)
library(mirt)

set.seed(123)

group = c(rep('A',2000),rep('B',500),rep('C',300),rep('D',10000),rep('E',20000))
mu = runif(5,-1,1)
sigma = runif(5,0.5,2)
theta = unlist(mapply(rnorm,table(group),mu,sigma))

items=tibble(item_id=sprintf('i%03i',1:100), item_score=1,beta=runif(100,-1,1))

# simulate
dat = r_score(items)(theta)
# make incomplete design
dat_incl = dat
dat_incl[group=='A',31:100] = NA
dat_incl[group=='B',c(1:20,51:100)] = NA
dat_incl[group=='C',c(1:40,71:100)] = NA
dat_incl[group=='D',c(1:60,91:100)] = NA
dat_incl[group=='E',1:80] = NA

# fix items unique to A
fixed=slice(items,1:20)

# dexter
f = fit_enorm(dat_incl, fixed_params=fixed)

# do mirt
# get starting values to fix params
m_rasch = multipleGroup(data = dat_incl, itemtype = 'Rasch', model = 1,group = group,
                        invariance = c("free_means", "free_var", "slopes", "intercepts"),
                        pars = "values")

for (i in 1:nrow(fixed)) {
  m_rasch[m_rasch$item == fixed[i, ]$item_id & m_rasch$name == "d", "value"] = -fixed[i, ]$beta
  m_rasch[m_rasch$item == fixed[i, ]$item_id & m_rasch$name == "d", "est"] = FALSE
}
m_rasch[m_rasch$name == "MEAN_1", ]$est = TRUE
m_rasch[m_rasch$name == "COV_11", ]$est = TRUE

# for real, takes a while
est_rasch = multipleGroup(data = dat_incl, itemtype = 'Rasch', model = 1,  group = group,
                          invariance = c("free_means", "free_var", "slopes", "intercepts"),
                          pars = m_rasch)

par_rasch = coef(est_rasch, as.data.frame = TRUE, IRTpars = TRUE)[[1]] %>%
  as_tibble(rownames = "item_id") %>%
  filter(grepl('\\.b$',item_id)) %>%
  mutate(item_id=str_extract(item_id,'^[^\\.]+'))

par(mfrow=c(2,2))
tst = inner_join(par_rasch, coef(f))
plot(tst$beta,tst$par,ylab='dexter',xlab='mirt')
abline(0,1)

tst2 = inner_join(par_rasch, items)
plot(tst2$beta,tst2$par,xlab='true',ylab='mirt')
abline(0,1)

tst3 = inner_join(coef(f), items,by=c('item_id','item_score'))
plot(tst3$beta.y,tst3$beta.x,xlab='true',ylab='dexter')
abline(0,1)

mirt

philchalmers commented 4 years ago

These aren't the same models being fitted (one is a multiple-group model the other a single-group). I also don't think the model is identified when the first groups latent mean is freely estimated, in which case for the first group you should also generate data with a mean of 0 for the latent traits for nicer calibration.

What are you trying to do exactly? To me is seems like you're trying to do a fixed-item calibration method, which really only requires the use of the mirt() function.

philchalmers commented 4 years ago

I'm going to close this for now as I don't see this as a coding problem and more a problem with mixing model specifications.

jessekps commented 4 years ago

I'm indeed trying to do a fixed item calibration. The population is a mixture of 5 normals with their own means and sd's and known group membership. Item parameters are the same for all groups.

I thought that for this all groups should be freely estimated since the scale is identified by the fixed parameters for the first 20 items so fixing any group parameters would interfere with that. I used multipleGroup since the population as a whole is not normally distributed and I don't see a way to supply that fact to the mirt function.

If this is a case of me misunderstanding the function interface I can move this to the mirt google groups if you like.

thanks, jesse

philchalmers commented 4 years ago

I see what you're trying to do, but it might not work the same way you're expecting. If the first 20 items are treated as fixed then they really only apply to group A as the item information is not used at all for groups B:E (you could fix the item parameters at any values and it wouldn't matter since the NA's effectively exclude these items during the calibration within groups, though some information does come from the 10 newly calibrated items in group A....which is obviously less than ideal). As such, the latent trait parameter estimates don't really get to 'borrow strength' from these fixed parameters since they don't factor into the likelihood equations.

Treating the model as a single group implementation fixes this issue since the hyper-parameter information is borrowed from all items during estimation and there's less uncertainty associated with the latent trait parameters. And sure, if you'd like to continue this further we can move to the forum if you have more design-based questions. Also, if you have a published resource that discusses the method you're trying to execute then that might be helpful to ease communication. HTH.

jessekps commented 4 years ago

I understand what you're saying. It's certainly a less than ideal design. I did not choose it but I have to work with something only slightly less extreme in my current analysis and CML doesn't help since a 2PL is desired. I understand your explanation, though I also don't think using the mirt function with one normal distribution for the population is appropriate here. Since the subpopulations differ in respect to ability and the membership determines which responses are missing, I think it becomes in effect a targeted design. I really don't know that much about MML estimation (and I certainly don't mean to lecture an expert) but the condition I'm worried about is described here:

Eggen, T. J. H. M., & Verhelst, N. D. (2011). Item calibration in incomplete testing designs. Psicologica, 32(1), 107-132. https://ris.utwente.nl/ws/portalfiles/portal/6592241/7EGGEN.pdf see pages 119-121

So I am trying to use a method where each subpopulation's variance and mean is freely estimated.

I said earlier that I was able to get the correct results in TAM but I went through my code and it was a different dataset. TAM gives exactly the same results as mirt in this case, my apologies for the mistake. So not an error then but an impossible calibration design. Thanks for the explanation and if I have further questions I'll move to the google groups.

kind regards, Jesse