Open j-kreis opened 6 years ago
test_perf <- mod$control$summaryFunction(test_preds, lev = mod$finalModel$obsLevels)
is wrong because it postResample
the whole column of obs
and pred
regardless of dataType
. Since test result is already calculated and included in true_perf
, we can extract it and format it as a named vector in order to comply with the following code.
A temp solution is:
# test_perf <- mod$control$summaryFunction(test_preds, lev = mod$finalModel$obsLevels)
test_perf <- as.numeric(true_perf[1,-1])
names(test_perf) <- names(true_perf[1,-1])
Of course, there should be a more elegant fix to be done.
I realize that true_perf
is the variable to show the bug made by @juliangkr . A possible fix is:
if(test_prop > 0) {
if(!mod$control$classProbs) {
test_preds <- extractPrediction(list(model = mod),
testX = dat[-for_model, colnames(dat) != outcome, drop = FALSE],
testY = dat[-for_model, outcome])
} else {
test_preds <- extractProb(list(model = mod),
testX = dat[-for_model, colnames(dat) != outcome, drop = FALSE],
testY = dat[-for_model, outcome])
}
# select only the rows with dataType Test
test_preds <- test_preds[test_preds$dataType=="Test",]
test_perf <- mod$control$summaryFunction(test_preds, lev = mod$finalModel$obsLevels)
test_perf <- as.data.frame(t(test_perf))
test_perf$Training_Size <- length(in_mod)
tested[[i]] <- test_perf
try(rm(test_preds, test_perf), silent = TRUE)
}
if(!mod$control$classProbs) {
# it's not necessary to add extra testX since here we only need predictions for training
app_preds <- extractPrediction(list(model = mod))
} else {
app_preds <- extractProb(list(model = mod))
}
app_perf <- mod$control$summaryFunction(app_preds, lev = mod$finalModel$obsLevels)
app_perf <- as.data.frame(t(app_perf))
app_perf$Training_Size <- length(in_mod)
apparent[[i]] <- app_perf
try(rm(mod, in_mod, app_preds, app_perf), silent = TRUE)
}
I think there is a bug in the
caret::learing_curve_dat ()
function (#278). The performance calculation for the test set is done on the model predictions of both, the test and train data set (line 36 of the function).Here is my minimal example. I added a few prints to the
caret::learing_curve_dat ()
function, to show the true performance compared to the one which is currently used.Additionally, the typo in the function name was not fixed, mentioned in #278.
Minimal, reproducible example:
The result of this code shows, that the resulting value of the function only returns the mixed performance, which is calculated on both, the training and test set (see data Type columns).
Session Info: