Closed jeanimal closed 8 years ago
What's going on: Several functions assume the heuristics 1) use the same criterion and 2) use the same cols_to_fit. There are comments in the function stating these assumptions, but now that this is a public package, the code should throw an error if this assumption is violated rather than throwing crytpic errors or, worse, giving wrong answers. That's step 1 of the fix.
The workarounds:
The long-term fix. Step 2 of the fix is that percentCorrect should handle this case correctly. For every heuristic with different cols_to fit, it needs to create a separate heursitics(), as we did manually above. Currently it just throws them all in the same heuristics.
Note that we still cannot handle the case of different criterion columns, and I don't plan to handle it. However, I will change the code to raise and error if this happens rather than silently producing garbage.
I just committed code so heuristics()-- and thus percentCorrect-- will throw an error for a mismatch. It's spread across a few commits, so I won't put the number. I also improved documentation on this issue.
data <- cbind(y=c(30,20,10,5), x1=c(1,1,0,0), x2=c(1,1,0,1)) ttb <- ttbModel(data, 1, c(2:3)) ttb_just3 <- ttbModel(data, 1, c(3), fit_name="ttb_just3") rowPairApply(data, heuristics(ttb, ttb_just3)) Error in FUN(X[[i]], ...) : ERROR: Models with different cols_to_fit: 2, 3 vs. 3 . Instead, put the models in separate heuristics functions, as shown in documentation examples.
Fixed and tested at af8bce172948fb0789b58ae1e92f1780432f80b7.
Running the example above now results in this output:
EXAMPLE 1:
fit two models
mod1 <- unitWeightModel(slice,criterion_col,cols_to_fit) mod2 <- unitWeightModel(slice,criterion_col,c(4,6))
get fitting accuracy
percentCorrectList(slice,list(mod1, mod2)) unitWeightModel unitWeightModel 1 78.94737 66.05263
As further validation, we get the same answer when we reverse the order of the models. (The old code took cols_to_fit from the first model, which made order matter.)
percentCorrectList(slice,list(mod2, mod1)) unitWeightModel unitWeightModel 1 66.05263 78.94737
Now the other example:
EXAMPLE 2:
mod1 <- logRegModel(slice,criterion_col,cols_to_fit) mod2 <- logRegModel(slice,criterion_col,c(4,6))
percentCorrectList(slice,list(mod1, mod2)) logRegModel logRegModel 1 86.84211 66.57895 percentCorrectList(slice,list(mod1)) logRegModel 1 86.84211 percentCorrectList(slice,list(mod2)) logRegModel 1 66.57895
Note that there is a simpler function interface using percentCorrect rather than percentCorrectList:
percentCorrect(slice, mod1, mod2) logRegModel logRegModel 1 86.84211 66.57895
library(heuristica)
data(city_population) criterion_col <- 3 cols_to_fit <- 4:ncol(city_population)
reduce size of dataset
slice <- city_population[1:20,]
EXAMPLE 1:
fit two models
mod1 <- unitWeightModel(slice,criterion_col,cols_to_fit) mod2 <- unitWeightModel(slice,criterion_col,c(4,6))
get fitting accuracy
percentCorrectList(slice,list(mod1,mod2))
Error in getCuePairDirections(row1, row2) %*% coefficients : non-conformable arguments
mod1 <- logRegModel(slice,criterion_col,cols_to_fit) mod2 <- logRegModel(slice,criterion_col,c(4,6))
percentCorrectList(slice,list(mod1,mod2)) logRegModel logRegModel 1 86.84211 42.63158
percentCorrectList(slice,list(mod1)) logRegModel 1 86.84211 percentCorrectList(slice,list(mod2)) logRegModel 1 66.57895