xluo11 / xxIRT

R package for item response theory
25 stars 15 forks source link

Does the ATA algorithm consider items groups? #4

Closed giadasp closed 6 years ago

giadasp commented 6 years ago

Dear Xiao Luo, I tried to apply your ATA algorithm in R and I didn't find a way to consider units (groups of items) in the constraints, is it somehow possible? Moreover there is no documentation on enemy items, I created a vector (ex: enemy1) for the group of enemies that contains the ids of the items that cannot stay together in the same form (ex: enemy1<-c(1,2,3) ) and then I did x<-ata_item_enemy(x, enemy1). Is it correct? Another important constraint that you could add to your algorithm is the overlap between forms, in your package you consider only the item use. I hope that my suggestions and questions can help you to improve your package. Thank you for your wonderful job!

Giada

xluo11 commented 6 years ago

Hi Giada,

For the item sets, you can collapse them into "individual items" and use them to do the ATA . Remember to aggregate the item properties appropriately and also avoid use the default test length control because these new "aggregated items" do not have a length of 1 anymore. See the Example #4 in the ATA section of the documentation for an example.

The way you set the enemy relationship is correct. Just throw in a vector of IDs of the enemy items. In your example, the algorithm will assemble only one of the first three items in each form.

Hope it helps.

Xiao

giadasp commented 6 years ago

The problem is that you cannot aggregate not statistical properties (such as "content" in your examples) if an item set contains items with different properties and I should manipulate the module3_ata code to consider the set length. Is it right? Can you explain better what do you mean for "collapse" argument in the context of test assembly? Thank you for your help

Giada SP

xluo11 commented 6 years ago

I thought about writing something to handle item sets, but realized that tweaking the pool might be a cleaner approach. For categorical/discrete properties, you can do the dummy coding first and aggregate them afterwards. For example, if one item set has three items in three content domains. The original pool may use three rows to represent these three items and have a variable named 'content' and coded as '1', '2', '3'. The aggregated pool will have one row to represent the aggregated item set and have one three variables named as 'content_1' and 'content_2' and 'content_3', and each of the variable will have a value of 1, meaning have one item in each content domain. The quantitative/continuous properties can be aggregated without using dummy coding.

You do need to create an extra variable to represent the number of items in the set, and use that as a regular constraint to control the overall test length rather than using the 'len' argument in the 'ata' function. The 'len' argument will consider each row of the pool is just one item, which is not true in your case. Also, create variables pertaining to the information at which theta points you want to optimize for the same reason. The 'coef' argument in the 'ata_obj_relative' and 'ata_obj_absolute' can take a vector of theta and compute the information at those thetas, but it is based on the assumption that each row is one item in the pool.

A simple walk through of the Example 4: in the beginning, I generated an item pool of 100 items using the 3PL model, and generated content codes (three domains) and item set IDs (randomly assign a code from 1--30 to 100 items). Next, I computed the information at three theta points -1, 0, 1, which will be used to control the test information function later on. Next, I used the item set IDs to group individual items into sets by adding up the information at theta=-1, 0, and 1, counting how many items are in content domain 1 and 2, and 3, and counting the number of items in the set. By now, I have a pool of 30 rows and 8 variables (id, info_-1, info_0, info_1, content_1, content_2, content_3, n). Now it is time to set up the ATA problem. First, use 'ata' function to import the pool and ask it to assemble 2 forms. Next, ask the solver to maximize information at -1, 0, and 1. Next, ask the solver to assemble 10 items in total in each form, and 3 items from domain 1, 3 from domain 2, and 4 from domain 3. You can find the results in the documentation.

If you can post your sample data and the problem you try to solve, I will try to write some code to explain it.

giadasp commented 6 years ago

Thank you very much for your clear and detailed explanation! I will try what you suggested. I cannot post my data because they are real and they cannot be shared but I will try to reproduce something similar and I'll post it.

Anyway I think that it's possible to treat the item set problem directly modifying the constraints matrix without passing from dummy variables so if you want to make everything more user friendly I suggest you to work on it, more variables means more constraints and this will make the model more and more difficult to optimize.

I will update you in the next days with a feedback on my item set problem.

Giada SP

xluo11 commented 6 years ago

Thanks for the suggestion. I'll surely work on that in the future.

giadasp commented 6 years ago

Hello again, I tried to use you method to consider item sets and I think that I succeded (not so sure because I didn’t have the time to check), anyway I had problems with lpSolveAPI (it didn’t fin any solution) and so I tried to use the glpkAPI package but I encountered some errors in running the code so I did some modification to your ata_solve_glpk function and now this is what I’m using (it works for me):

ata_solve_glpk <- function(x, ...) { if(class(x) != "ata") stop("not an 'ata' object") lp <- initProbGLPK() addRowsGLPK(lp, nrow(x$mat)) addColsGLPK(lp, ncol(x$mat))

(max): optimization direction

setObjDirGLPK(lp, ifelse(x$max, GLP_MAX, GLP_MIN))

(types): x's = binary, y = continuous

types <- sapply(x$types, function(x) switch(x, "C"=GLP_CV, "I"=GLP_IV, "B"=GLP_BV)) for(i in seq_along(types)) setColKindGLPK(lp, i, types[i])

(obj): omit coef=0

for(i in seq_along(x$obj)) if(x$obj[i] != 0) setObjCoefGLPK(lp, i, x$obj[i])

(dir & rhs): row bounds

dir <- sapply(x$dir, function(x) switch(x, "<="=GLP_UP, ">="=GLP_LO, "=="=GLP_FX)) for(i in 1:nrow(x$mat)) setRowBndGLPK(lp, i, dir[i], lb=x$rhs[i], ub=x$rhs[i])

(bounds): column bounds

bounds.lb <- sapply(x$types, function(x) switch(x, "C"=0, "I"=0, "B"=0)) bounds.ub <- sapply(x$types, function(x) switch(x, "C"=Inf, "I"=Inf, "B"=1)) with(x$bounds, for(i in seq_along(ind)) { if(!is.na(lb[i])) bounds.lb[ind[i]] <- lb[i] if(!is.na(ub[i])) bounds.ub[ind[i]] <- ub[i] })

(mat)

how many non-zeros in matrix

ind <- (x[["mat"]] != 0) indv<-list() indv$mat<-as.vector(x[["mat"]]) indv$ind<-as.vector(ind) nrow<-nrow(ind) ncol<-ncol(ind) indv$rowind<-rep(1:nrow, ncol) index<-0 for (i in (1:ncol)){ indv$colind[(index+1):(inrow)]<-rep(i, times=nrow) index<-(nrowi) } indvdf<-list2df(indv)

ia<-filter(indvdf, ind=="TRUE")$rowind ja<-filter(indvdf, ind=="TRUE")$colind ar<-filter(indvdf, ind=="TRUE")$mat

ia<-row(x[["mat"]])[x[["mat"]]!=0]

ja<-col(x[["mat"]])[x[["mat"]]!=0]

ia <- rep(1:nrow(x[["mat"]]), ncol(x[["mat"]]))[ind]

ja <- rep(1:ncol(x[["mat"]]), each=nrow(x[["mat"]]))[ind]

ar <- x[["mat"]][ind]

loadMatrixGLPK(lp, length(ar), ia, ja, ar)

mip control parameters:

setMIPParmGLPK(PRESOLVE, GLP_ON) setMIPParmGLPK(MIP_GAP, mip.gap) setMIPParmGLPK(TM_LIM, timeout*1000) opts <- list(...) for(i in seq_along(opts)) setMIPParmGLPK(get(names(opts)[i]), opts[[i]])

set bound for y: positive = (lb=0); negative = (ub = 0)

setColBndGLPK(lp, x$nlp, ifelse(x$negative, GLP_UP, GLP_LO), 0, 0)

solve

status <- solveMIPGLPK(lp) optimum <- mipObjValGLPK(lp) result <- matrix(mipColsValGLPK(lp)[2:x$nlp - 1], ncol=x$nforms, byrow=FALSE) list(status=status, optimum=optimum, result=result) }

Where list2df() is :

list2df <- function(x) { MAX.LEN <- max(sapply(x, length), na.rm = TRUE) DF <- data.frame(lapply(x, function(x) c(x, rep(NA, MAX.LEN - length(x))))) colnames(DF) <- names(x)
DF }

In detail I had problems with ia, ja and ar vectors because they contained NAs values using your algorithm. I know that maybe my option is not so efficient (for loops etc…) but it works now. Let me know if this can be helpful for you to improve your code. Thanks

Giada

xluo11 commented 6 years ago

I'm glad to hear the good news from you and thanks for the suggestions. I'll take your suggestions into considerations in the next round of development. Cheers.