wjakethompson / measr

R package for the Bayesian estimation of diagnostic classification models using Stan
https://measr.info
GNU General Public License v3.0
10 stars 3 forks source link

JOSS Review #38

Closed seandamiandevine closed 1 year ago

seandamiandevine commented 1 year ago

Hi! @chartgerink asked me to review this package for JOSS and focus particularly on the Stan implementation. I am not a content expert on DCMs or IRT, so I will focus on the Bayesian modeling side. Overall, code all seems appropriate. Below are some thoughts and questions I had. Some of these may be non-issues and stem from my ignorance of common practice in the field.

Open issues:

wjakethompson commented 1 year ago

Hi @seandamiandevine 👋! Thanks for the feedback. A couple of responses:

  1. Yes, the user is able to specify priors for each individual parameter. An example of specifying a prior for a specific item using the LCDM is included in the model estimation vignette, but I agree that this is confusing given the current help page for measrprior(). I've updated that documentation to include an example of setting a prior for a specific item (#39).
  2. Yes, these priors are commonly used in the DCM literature. This guessing parameter is the the probability that you can guess the right answer when you don't have the requisite knowledge. Most applications would be based on multiple choice options with 4-5 responses, so we would expect the guessing parameters to be somewhere around .2-.25. The Beta(5, 25) has a pretty large variance to cover these plausible values. The slipping parameter is just the reverse (the probability that someone with the requisite knowledge gets the item wrong). In most applications, these parameters have a very similar distribution to the guessing parameter, and so gets the same default prior.
  3. The bounded interactions ensure monotonicity of the model. Take for example an item that measures 2 attributes:
    • Intercept = -0.50. This is the log odds that someone who is not proficient on either of the attributes provides a correct response. On the probability scale, this would be a .37 probability of providing a correct response.
    • Main effect 1 = 1.25. The increase in the log odds of providing a correct response for someone who is proficient on attribute 1. On the probability scale, this would be .68 (i.e., inverse logit of -0.50 + 1.25). The main effect is constrained to be positive to ensure that someone who is proficient has a higher probability of a correct response than someone who is not.
    • Main effect 2 = 1.50. The increase in the log odds of providing a correct response for someone who is proficient on attribute 2. On the probability scale, this would be .73 (i.e., inverse logit of -0.50 + 1.50). The main effect is constrained to be positive to ensure that someone who is proficient has a higher probability of a correct response than someone who is not.
    • Now the interaction. This is the change in log odds for someone who is proficient on both attributes. The interaction must be greater than the smallest main effect (i.e., greater than -1.25), otherwise someone who is proficient in both attributes might have a lower probability of providing a correct response than some who is proficient in only one attribute. For example, assume there was no constraint, and the interaction was estimated to be -1.35. The log-odds of providing a correct response would then be: -0.5 + 1.25 + 1.40 - 1.35 = 0.90). On the probability scale, this is a .71 probability of providing a correct response. That is, someone who is proficient on both attributes 1 and 2 has a lower probability of providing a correct response (.71) than someone who is proficient only on attribute 2 (.73). Conceptually, this doesn't make sense, so the constraints are in place to ensure monotonicity, meaning that the probability of providing a correct response always increases as additional attributes are acquired.
  4. I agree that the for loops are far from ideal. The problem I've run into trying to use an R x C matrix is that we have to have responses from all respondents on all items to make it work (i.e., no missing data). Because I need to both allow for missing data marginalize over discrete parameters, this was the best solution I could come up with (see a discussion of the issue on the Stan discourse page). But if you come across any alternative solutions, I would love to be able to speed things up!
seandamiandevine commented 1 year ago

Thanks Jake. This makes sense to me. I have some thoughts about vectorizing, but I will think about them more deeply and open another issue (outside of the scope of this review) if need be.

Great job and great package!