jeanimal / heuristica

Heuristic functions in R, such as Take The Best, unit-weighted linear (Dawes' rule), plus helper functions.
Other
5 stars 2 forks source link

CRAN v1.1: Throw errors for mismatched criterion_col or cols_to_fit #38

Closed jeanimal closed 8 years ago

jeanimal commented 8 years ago

library(heuristica)

data(city_population) criterion_col <- 3 cols_to_fit <- 4:ncol(city_population)

reduce size of dataset

slice <- city_population[1:20,]

EXAMPLE 1:

fit two models

mod1 <- unitWeightModel(slice,criterion_col,cols_to_fit) mod2 <- unitWeightModel(slice,criterion_col,c(4,6))

get fitting accuracy

percentCorrectList(slice,list(mod1,mod2))

Error in getCuePairDirections(row1, row2) %*% coefficients : non-conformable arguments

percentCorrectList(slice,list(mod1)) unitWeightModel 1 78.94737 percentCorrectList(slice,list(mod2)) unitWeightModel 1 66.05263

EXAMPLE 2:

mod1 <- logRegModel(slice,criterion_col,cols_to_fit) mod2 <- logRegModel(slice,criterion_col,c(4,6))

percentCorrectList(slice,list(mod1,mod2)) logRegModel logRegModel 1 86.84211 42.63158

percentCorrectList(slice,list(mod1)) logRegModel 1 86.84211 percentCorrectList(slice,list(mod2)) logRegModel 1 66.57895

jeanimal commented 8 years ago

What's going on: Several functions assume the heuristics 1) use the same criterion and 2) use the same cols_to_fit. There are comments in the function stating these assumptions, but now that this is a public package, the code should throw an error if this assumption is violated rather than throwing crytpic errors or, worse, giving wrong answers. That's step 1 of the fix.

The workarounds:

  1. Calculate precentCorrect separately (as you did).
  2. Manually create the heuristics() and each model with a different cols_to_fit is in its own heuristics(). e.g. predictions <- rowPairApply(slice, heuristics(mod1), heuristics(mod2)) 100 * categoryAccuracyAll(predictions, 1, c(2:ncol(predictions)))

The long-term fix. Step 2 of the fix is that percentCorrect should handle this case correctly. For every heuristic with different cols_to fit, it needs to create a separate heursitics(), as we did manually above. Currently it just throws them all in the same heuristics.

Note that we still cannot handle the case of different criterion columns, and I don't plan to handle it. However, I will change the code to raise and error if this happens rather than silently producing garbage.

jeanimal commented 8 years ago

I just committed code so heuristics()-- and thus percentCorrect-- will throw an error for a mismatch. It's spread across a few commits, so I won't put the number. I also improved documentation on this issue.

data <- cbind(y=c(30,20,10,5), x1=c(1,1,0,0), x2=c(1,1,0,1)) ttb <- ttbModel(data, 1, c(2:3)) ttb_just3 <- ttbModel(data, 1, c(3), fit_name="ttb_just3") rowPairApply(data, heuristics(ttb, ttb_just3)) Error in FUN(X[[i]], ...) : ERROR: Models with different cols_to_fit: 2, 3 vs. 3 . Instead, put the models in separate heuristics functions, as shown in documentation examples.

jeanimal commented 8 years ago

Fixed and tested at af8bce172948fb0789b58ae1e92f1780432f80b7.

Running the example above now results in this output:

EXAMPLE 1:

fit two models

mod1 <- unitWeightModel(slice,criterion_col,cols_to_fit) mod2 <- unitWeightModel(slice,criterion_col,c(4,6))

get fitting accuracy

percentCorrectList(slice,list(mod1, mod2)) unitWeightModel unitWeightModel 1 78.94737 66.05263

As further validation, we get the same answer when we reverse the order of the models. (The old code took cols_to_fit from the first model, which made order matter.)

percentCorrectList(slice,list(mod2, mod1)) unitWeightModel unitWeightModel 1 66.05263 78.94737

Now the other example:

EXAMPLE 2:

mod1 <- logRegModel(slice,criterion_col,cols_to_fit) mod2 <- logRegModel(slice,criterion_col,c(4,6))

percentCorrectList(slice,list(mod1, mod2)) logRegModel logRegModel 1 86.84211 66.57895 percentCorrectList(slice,list(mod1)) logRegModel 1 86.84211 percentCorrectList(slice,list(mod2)) logRegModel 1 66.57895

jeanimal commented 8 years ago

Note that there is a simpler function interface using percentCorrect rather than percentCorrectList:

percentCorrect(slice, mod1, mod2) logRegModel logRegModel 1 86.84211 66.57895