mlr-org / mlr

Machine Learning in R
https://mlr.mlr-org.com
Other
1.64k stars 405 forks source link

Add the option to load batchmark results without models to reduceBatchmarkResults() #2487

Closed sycoforbidden closed 4 years ago

sycoforbidden commented 6 years ago

Given a set of results in registry created with models, you can't load the benchmark result without the models. This could just require a small change in the underlying reduceBatchmarkResults() function by adding a .models = TRUE or .models = FALSE parameter.

since the .models parameter is given at head, even without changing any names the alternative would theoretically work. In the case where .models = TRUE but there is no model in the registry, one could maybe pass an error?

EDIT: use .models internally, models externally.

# EDIT: Better modified suggested function:

function (ids = NULL, keep.pred = TRUE, models = TRUE, show.info = getMlrOption("show.info"), 
                    reg = batchtools::getDefaultRegistry()) 
{
  requirePackages("batchtools", why = "batchmark", default.method = "load")
  assertFlag(keep.pred)
  assertClass(reg, "ExperimentRegistry")
  if (is.null(ids)) 
    ids = batchtools::findDone(reg = reg)
  if (NROW(ids) != nrow(batchtools::findExperiments(reg = reg))) 
    warning("Collecting results for a subset of jobs. The resulting BenchmarkResult may be misleading.")
  problem = algorithm = NULL
  tab = batchtools::getJobPars(ids, reg = reg)[, c("job.id", 
                                                   "problem", "algorithm")]
  setkeyv(tab, cols = c("problem", "algorithm"), physical = FALSE)
  result = namedList(tab[, unique(problem)])
  for (prob in names(result)) {
    algos = unique(tab[problem == prob], by = "algorithm")
    data = batchtools::makeJob(id = algos$job.id[1L], reg = reg)$problem$data
    result[[prob]] = namedList(algos$algorithm)
    for (algo in names(result[[prob]])) {
      res = batchtools::reduceResultsList(tab[problem == 
                                                prob & algorithm == algo], reg = reg)
      if (is.null(res[[1L]]$model) & models) 
        warning("models = TRUE but there are no models in BenchmarkResult")
      .models = !is.null(res[[1L]]$model) & models
      lrn = data$learner[[algo]]
      extract.this = getExtractor(lrn)
      rs = mergeResampleResult(learner.id = algo, task = data$task, 
                               iter.results = res, measures = data$measures, 
                               rin = data$rin, keep.pred = keep.pred, models = .models, 
                               show.info = show.info, runtime = NA, extract = extract.this)
      rs$learner = lrn
      result[[prob]][[algo]] = addClasses(rs, "ResampleResult")
    }
  }
  makeS3Obj(classes = "BenchmarkResult", results = result, 
            measures = data$measures, learners = data$learner[as.character(tab[, 
                                                                               unique(algorithm)])])
}
pat-s commented 6 years ago

Could you please provide a reprex, preferably with the reprex pkg?

sycoforbidden commented 6 years ago

I don't know how to do it, is there a guide? I also never made a package, but since there are no dependencies on the reduceBatchmarkResults(), it should work independently. Any help would be appreciated.

I just did the reprex, you can copy it from below. The last 3 lines are the important results.

EDIT: Making a better reproducible example.

pat-s commented 6 years ago

I am asking for a reproducible example. You just pasted (a lot of) function code.

The reprex may help you.

sycoforbidden commented 6 years ago

I am asking for a reproducible example. You just pasted (a lot of) function code.

The reprex may help you.

I think that should be it. The only change from one method to the other is the environment of the learner call, i don't know how important that can be, and it can be due to the code part where I change the function definition:

tmpfun <- get("reduceBatchmarkResults", envir = asNamespace("mlr"))
environment(newfun) <- environment(tmpfun)
attributes(newfun) <- attributes(tmpfun)
assignInNamespace("reduceBatchmarkResults", newfun, ns = "mlr")
sycoforbidden commented 5 years ago
library(pls)
library(batchtools)
library(mlr)

# ------------------------------------

# Proposed new function definition:

newfun <- function (ids = NULL, keep.pred = TRUE, models = TRUE, show.info = getMlrOption("show.info"), reg = batchtools::getDefaultRegistry()) { ... }

# ------------------------------------

# Start a new simple experiment.

tsk <- makeRegrTask(data = mtcars, target = "mpg")
lrn <- list(makeLearner("regr.plsr"),
            makeLearner("regr.nnet"))
bm <- list()

# ------------------------------------

# Original models = TRUE

makeExperimentRegistry(file.dir = "BM_TRUE", seed = 1234)
batchmark(learners = lrn,
          tasks = tsk,
          resamplings = cv3,
          measures = rmse,
          models = TRUE)

submitJobs()
bm[[1]] <- reduceBatchmarkResults(show.info = FALSE)
getBMRModels(bm[[1]])
#> $mtcars
#> $mtcars$regr.plsr
#> $mtcars$regr.plsr[[1]]
#> Model for learner.id=regr.plsr; learner.class=regr.plsr
#> Trained on: task.id = mtcars; obs = 21; features = 10
#> Hyperparameters: 
#> 
#> $mtcars$regr.plsr[[2]]
#> Model for learner.id=regr.plsr; learner.class=regr.plsr
#> Trained on: task.id = mtcars; obs = 21; features = 10
#> Hyperparameters: 
#> 
#> $mtcars$regr.plsr[[3]]
#> Model for learner.id=regr.plsr; learner.class=regr.plsr
#> Trained on: task.id = mtcars; obs = 22; features = 10
#> Hyperparameters: 
#> 
#> 
#> $mtcars$regr.nnet
#> $mtcars$regr.nnet[[1]]
#> Model for learner.id=regr.nnet; learner.class=regr.nnet
#> Trained on: task.id = mtcars; obs = 21; features = 10
#> Hyperparameters: size=3
#> 
#> $mtcars$regr.nnet[[2]]
#> Model for learner.id=regr.nnet; learner.class=regr.nnet
#> Trained on: task.id = mtcars; obs = 21; features = 10
#> Hyperparameters: size=3
#> 
#> $mtcars$regr.nnet[[3]]
#> Model for learner.id=regr.nnet; learner.class=regr.nnet
#> Trained on: task.id = mtcars; obs = 22; features = 10
#> Hyperparameters: size=3

# ------------------------------------

# Original models = FALSE

makeExperimentRegistry(file.dir = "BM_FALSE", seed = 1234)
batchmark(learners = lrn,
          tasks = tsk,
          resamplings = cv3,
          measures = rmse,
          models = FALSE)
submitJobs()

bm[[2]] <- reduceBatchmarkResults(show.info = FALSE)
getBMRModels(bm[[2]])
#> $mtcars
#> $mtcars$regr.plsr
#> NULL
#> 
#> $mtcars$regr.nnet
#> NULL

# ------------------------------------

# Change the function definition to the new proposed function.

tmpfun <- get("reduceBatchmarkResults", envir = asNamespace("mlr"))
environment(newfun) <- environment(tmpfun)
attributes(newfun) <- attributes(tmpfun)
assignInNamespace("reduceBatchmarkResults", newfun, ns = "mlr")

# ------------------------------------

# Test for TRUE, models = TRUE or FALSE

batchtools::loadRegistry("BM_TRUE", writeable = TRUE)
bm[[3]] <- mlr::reduceBatchmarkResults(models = TRUE, show.info = FALSE)
bm[[4]] <- mlr::reduceBatchmarkResults(models = FALSE, show.info = FALSE)
removeRegistry(wait = 0)

# ------------------------------------

# Test for FALSE, models = TRUE or FALSE, should give a warning for bm[[5]].

batchtools::loadRegistry("BM_FALSE", writeable = TRUE)
bm[[5]] <- mlr::reduceBatchmarkResults(models = TRUE, show.info = FALSE)

#> Warning in mlr::reduceBatchmarkResults(models = TRUE, show.info = FALSE):
#> models = TRUE but there are no models in BenchmarkResult

#> Warning in mlr::reduceBatchmarkResults(models = TRUE, show.info = FALSE):
#> models = TRUE but there are no models in BenchmarkResult

bm[[6]] <- mlr::reduceBatchmarkResults(models = FALSE, show.info = FALSE)
removeRegistry(wait = 0)

# ------------------------------------

# Results 1 and 3 should be identical, but they aren't due to a change in environment

identical(bm[[1]], bm[[3]])
#> [1] FALSE

attributes(bm[[1]]$results$mtcars$regr.nnet[[c(8, 1, 2, 16)]])$.Environment
#> <environment: 0x562723244e40>
attributes(bm[[3]]$results$mtcars$regr.nnet[[c(8, 1, 2, 16)]])$.Environment
#> <environment: 0x5627284f3d38>

identical(bm[[1]]$measures, bm[[3]]$measures)
#> [1] TRUE
identical(bm[[1]]$learners, bm[[3]]$learners)
#> [1] TRUE

# ------------------------------------

# The rest is identical:

identical(bm[[2]], bm[[6]])
#> [1] TRUE
identical(bm[[2]], bm[[5]])
#> [1] TRUE

# The result from the sapply below should be TRUE, FALSE, TRUE, FALSE, FALSE, FALSE.

mdl <- lapply(bm, getBMRModels)
sapply(1:length(mdl), function(i) {!is.null(mdl[[c(i,1,1)]])})
#> [1]  TRUE FALSE  TRUE FALSE FALSE FALSE

detach("package:mlr", unload = TRUE)
detach("package:batchtools", unload = TRUE)
detach("package:pls", unload = TRUE)
sycoforbidden commented 5 years ago
reduceBatchmarkResults <- function (ids = NULL, keep.pred = TRUE, models = TRUE, show.info = getMlrOption("show.info"), 
                    reg = batchtools::getDefaultRegistry()) 
{
  requirePackages("batchtools", why = "batchmark", default.method = "load")
  assertFlag(keep.pred)
  assertClass(reg, "ExperimentRegistry")
  if (is.null(ids)) 
    ids = batchtools::findDone(reg = reg)
  if (NROW(ids) != nrow(batchtools::findExperiments(reg = reg))) 
    warning("Collecting results for a subset of jobs. The resulting BenchmarkResult may be misleading.")
  problem = algorithm = NULL
  tab = batchtools::getJobPars(ids, reg = reg)[, c("job.id", 
                                                   "problem", "algorithm")]
  setkeyv(tab, cols = c("problem", "algorithm"), physical = FALSE)
  result = namedList(tab[, unique(problem)])
  for (prob in names(result)) {
    algos = unique(tab[problem == prob], by = "algorithm")
    data = batchtools::makeJob(id = algos$job.id[1L], reg = reg)$problem$data
    result[[prob]] = namedList(algos$algorithm)
    for (algo in names(result[[prob]])) {
      res = batchtools::reduceResultsList(tab[problem == 
                                                prob & algorithm == algo], reg = reg)
      if (is.null(res[[1L]]$model) & models) 
        warning("models = TRUE but there are no models in BenchmarkResult")
      .models = !is.null(res[[1L]]$model) & models
      lrn = data$learner[[algo]]
      extract.this = getExtractor(lrn)
      rs = mergeResampleResult(learner.id = algo, task = data$task, 
                               iter.results = res, measures = data$measures, 
                               rin = data$rin, keep.pred = keep.pred, models = .models, 
                               show.info = show.info, runtime = NA, extract = extract.this)
      rs$learner = lrn
      result[[prob]][[algo]] = addClasses(rs, "ResampleResult")
    }
  }
  makeS3Obj(classes = "BenchmarkResult", results = result, 
            measures = data$measures, learners = data$learner[as.character(tab[, 
                                                                               unique(algorithm)])])
}
stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.