Closed Sinan-Yavuz closed 3 years ago
Hi @Sinan-Yavuz, thank you for your contribution! We're gonna take a look at the issue and implement the necessary changes ASAP.
In the meantime, it'd be great if you could provide us with some examples why the current implementation of the function doesn't work, so we can add them to our unit test battery.
Hi @wleoncio, thank you very much for your quick reply.
Here is the example;
response_gen2 <- function (subject, item, theta, a_par = NULL, b_par, c_par = NULL,
d_par = NULL, item_no = NULL, ogive = "Logistic")
{
if (length(subject) != length(item))
stop("subject and item vectors must be equal length.",
call. = FALSE)
if (is.null(a_par))
a_par <- rep(1, length(unique(item)))
if (is.null(c_par))
c_par <- rep(0, length(unique(item)))
if (ogive == "Logistic")
DD <- 1
if (ogive == "Normal")
DD <- 1.7
if (is.null(item_no))
item_no <- seq(length(unique(item)))
if (is.null(d_par))
b_pars <- split(b_par, seq(length(b_par)))
if (!is.null(d_par)) {
d_mat <- do.call("cbind", d_par)
b_pars <- list()
for (i in 1:length(b_par)) {
if (sum(abs(d_mat[i, ])) != 0) {
b_list <- list()
for (j in 1:length(d_mat[i, ])) b_list[[j]] <- b_par[i] +
d_mat[i, j]
b_pars[[i]] <- unlist(b_list)
}
if (sum(abs(d_mat[i, ])) == 0)
b_pars[[i]] <- b_par[i]
}
}
names(b_pars) <- item_no
y <- numeric(length(subject))
for (n in 1:length(subject)) {
y[n] <- irt_gen2(theta = theta[subject[n]], a_par = a_par[which(item_no ==
item[n])], b_par = b_pars[[which(item_no == item[n])]],
c_par = c_par[which(item_no == item[n])], D = DD)
}
df_l <- data.frame(item = item, subject = subject, response = y)
df_w <- reshape(df_l, timevar = "item", idvar = "subject",
direction = "wide")
df_item_old <- colnames(df_w)[2:length(df_w)]
df_item_num <- gsub("[^[:digit:]]", "", df_item_old)
df_item_new <- ifelse(nchar(df_item_num) == 1, paste0("i00",
df_item_num), ifelse(nchar(df_item_num) == 2, paste0("i0",
df_item_num), paste0("i", df_item_num)))
colnames(df_w)[2:length(df_w)] <- df_item_new
df_w <- df_w[, order(names(df_w))]
rownames(df_w) <- NULL
return(y = df_w)
}
irt_gen2 <- function (theta, a_par = 1, b_par, c_par = 0, D = 1)
{
response_pr <- c_par + (1 - c_par) * (exp(a_par*(theta - b_par)) / (1 + exp(a_par*(theta - b_par))))
y <- rbinom(1, size = 1, prob = response_pr)
return(y)
}
set.seed(345)
#this is a normal example
subject <- 1:100
theta <- rnorm(100,0,1)
student_dt <- data.frame(subject, theta)
item <- as.integer(c(1:n_items))
a <- runif(n_items, 0.5, 1.5)
b <- runif(n_items, -3, 3)
c <- runif(n_items, 0, .15)
item_par <- data.frame(item, a, b, c)
resp_matrix <- response_gen(subject = sort(rep(subject, 30)), item = rep(item,100), theta = theta, a_par = a, b_par = b, c_par = c)
#item means make sense and everything looks fine
colMeans(resp_matrix[item])
lm(colMeans(resp_matrix[item]) ~ b)
#Let's make it very extreme, student mean theta is -20, I don't expect them to solve anything. But guessing parameters are super high. So, regardles if their theta 80%-90% of them should solve these items.
subject <- 1:100
theta <- rnorm(100, -20 ,1)
student_dt <- data.frame(subject, theta)
item <- as.integer(c(1:n_items))
a <- runif(n_items, 0.5, 1.5)
b <- runif(n_items, -3, 3)
#chance is super high
c <- runif(n_items, 0.8, .9)
item_par <- data.frame(item, a, b, c)
resp_matrix <- response_gen(subject = sort(rep(subject, 30)), item = rep(item,100), theta = theta, a_par = a, b_par = b, c_par = c)
colMeans(resp_matrix[item]) #look at the results, they are around 50%, what happened to .8-.9 guessing parameter
lm(colMeans(resp_matrix[item]) ~ b)
#let's check after the fix - but it only works for dichotomous variables
resp_matrix2 <- response_gen2(subject = sort(rep(subject, 30)), item = rep(item,100), theta = theta, a_par = a, b_par = b, c_par = c)
colMeans(resp_matrix2[item]) #look at the results, they are around 50%, what happened to .8-.9 guessing parameter
lm(colMeans(resp_matrix2[item]) ~ b)
Hi, I worked to identify this bug with @Sinan-Yavuz and the way you can tell this is a bug is that the intercept of the lm should move up as the guessing parameter increases.
Your formula works so long as the c parameter is 0. When it is not zero you make the wrong probability of a zero
The problem is that the binary is being treated like the polytomous case. In the binary case there is a guessing parameter and you do not want to divide by the denominator, the second element of the numerator is already the probability of a 1. c_par decreases, not increases the probability of a zero, but your formula increases it.
Thanks, Paul and Sinan - that's super helpful. We'll sort this out. Glad you are using lsasim!
OK, so I've created a markdown document based on the code above. Please check it here and let me know if the output conforms to your expectations.
As for implementing the changes, I am not sure how to proceed. The irt_gen2()
function proposed in the OP doesn't break any of the current unit tests, but I understand it only works for dichotomous variables, so I am not sure if it can simply replace irt_gen()
and call it a day. I'd also like to understand a bit better the differences between lsasim::response_gen()
and the response_gen2()
function from your MRE.
I think you may want to break up dichotomous from polytomous cases. I've never seen a polytomous case with a c parameter.
I think you could add this line right before you sample
response_pr[1] <- ifelse(c_par > 0, 1 - response_pr[2], response_pr[1])
so irt_get would be:
irt_gen <- function(theta, a_par = 1, b_par, c_par = 0, D = 1) {
unsummed <- c(0, D * a_par * (theta - b_par))
num <- exp(cumsum(unsummed))
den <- sum(num)
response_pr <- c_par + (1 - c_par) * (num / den)
response_pr[1] <- ifelse(c_par > 0, 1 - response_pr[2], response_pr[1])
y <- sample(1:length(response_pr) - 1, size = 1, prob = response_pr)
return(y)
}
but I haven't tested this and I'm still not entirely sure what is vectorized here.
Another workaround would be to use Sinan's code when c_par > 0 and your existing code otherwise.
Good ideas, @pdbailey0, we'll work on that ASAP. 👍🏽
Changes implemented and will be included on the next CRAN release of the package. @Sinan-Yavuz and @pdbailey0, thank you once again for your contribution; we have added you as package contributors, please let us know if the information below needs any modification (e.g. typo or addition of a field such as e-mail):
https://github.com/tmatta/lsasim/blob/97934b455b6e2fc6520f38dc58d85f3fca862af9/DESCRIPTION#L17-L18
@wleoncio, I am honored to be mentioned as a contributor though it is not necessary. I will make a modification to my name and email. At some point, if you change your mind and think my contribution is no longer valid or important, feel free to remove me.
Looking forward to the Pull Request containing the modifications, @Sinan-Yavuz!
Your and Paul's mentions were based on the documentation for the person()
function (see extract below) and your authorship of the code on commits 5f3396fdd5075a6ac3181fcd2cb4252db0ca9086 and dfb145087bd12673d59730fbb899e10a38cae891.
‘"ctb"’ (Contributor) Use for authors who have made smaller contributions (such as code patches etc.) but should not show up in the package citation.
By "package citation", they mean the contents of citation("lsasim")
.
Hello,
Thank you very much for creating "lsasim".
I have found a bug and want to share it with you.
irt_gen function doesn't produce a 0-1 matrix based on the parameters. I have changed the following to produce a proper dataset.
Let me know if you need me to provide you some examples, why irt_gen doesn't work.