mlr-org / mlr

Machine Learning in R
https://mlr.mlr-org.com
Other
1.65k stars 404 forks source link

makeModelMultiplexer for regr.glmnet produces error #647

Closed ecrutchfield closed 8 years ago

ecrutchfield commented 8 years ago

After creating a model multiplexer for regr.glmnet, any attempt to call predict with that learner will generate an error. For example...

  library(mlr)
  base.learners = list(
    makeLearner("regr.glmnet"),
    makeLearner("regr.randomForest")
  )
  learner = makeModelMultiplexer(base.learners)
  expect_equal(learner$predict.type, "response")
  # now lets see that the next code runs and does not complain about matrix output for
  # base learner predict output
  r = holdout(learner, regr.task)

Error in checkPredictLearnerOutput(.learner, .model, p) : 
  predictLearner for ModelMultiplexer has returned a class matrix instead of a numeric! 

This same error is also generated if you call tuneParams().

To fix this you should be able to call glmnet's predict method with the 's' param and it will generate a numeric vector, instead of a matrix. It appears the simple fix is to modify the predictLearner.regr.glmnet method in RLearner_regr_glmnet.R as follows:

drop(predict(.model$learner.model, newx = .newdata, s=getHyperPars(.learner)$s, ...))

Here's a suggested test case for test_tune_ModelMultiplexer.R

test_that("ModelMultiplexer works for regression tasks", {
  base.learners = list(
    makeLearner("regr.glmnet"),
    makeLearner("regr.randomForest")
  )
  learner = makeModelMultiplexer(base.learners)
  expect_equal(learner$predict.type, "response")
  # now lets see that the next code runs and does not complain about matrix output for
  # base learner predict output
  r = holdout(learner, regr.task)

  # now check that we can tune the threshold
  ps = makeModelMultiplexerParamSet(learner,
                                     makeDiscreteParam("alpha", c(0.5,0.9)),
                                    makeDiscreteParam("mtry", c(2, 3))
  )
  rdesc = makeResampleDesc("Holdout")
  ctrl = makeTuneControlGrid()
  res = tuneParams(learner, regr.task, resampling = rdesc, par.set = ps, control = ctrl)
})

I tried forking master and fixing this myself, but I had lot's of problems running the others tests i.e. I kept getting seg. faults and the R session would crash. Anyway, if you'd prefer I submit this fix via a pull request, I can give it a try.

Cheers, Eric

larskotthoff commented 8 years ago

Thanks for the report. If you submit a pull request, Travis will build it automatically and you can check whether anything else breaks this way. Running all of mlr's unit tests is unfortunately not very easy and requires about a million packages to be installed.

Doing a PR is the easiest way to check. You can always modify it later by changing the branch.

schiffner commented 8 years ago

Hi, I just read this issue and I'm not sure if we are tackling the core of the problem here.

In the definition of "regr.glmnet" (https://github.com/mlr-org/mlr/blob/master/R/RLearner_regr_glmnet.R) there is a default for s already set. If you train and predict "regr.glmnet" this parameter is automatically passed through to predict and everything works. So using constructions as getHyperPars(learner)$s in predict should normally not be necessary.

library(mlr)
library(testthat)

lrn = makeLearner("regr.glmnet")
lrn
# Learner regr.glmnet from package glmnet
# Type: regr
# Name: GLM with Lasso or Elasticnet Regularization; Short name: glmnet
# Class: regr.glmnet
# Properties: numerics,factors,ordered,weights
# Predict-Type: response
# Hyperparameters: s=0.01

holdout(lrn, bh.task)
# [Resample] holdout iter: 1
# [Resample] Result: mse.test.mean=27.7
# Resample Result
# Task: BostonHousing-example
# Learner: regr.glmnet
# mse.aggr: 27.66
# mse.mean: 27.66
# mse.sd: NA
# Runtime: 0.0361211

This automatic passing through should also work for the ModelMultiplexer. So in my opinion we should rather look there than fixing single learners. Right? @berndbischl @larskotthoff

berndbischl commented 8 years ago

Julia is correct. I am looking at this now

berndbischl commented 8 years ago

@ecrutchfield And please post what you did to run the unit tests and why it did not work. Then we can make the contributer help page better

ecrutchfield commented 8 years ago

I think I found the issue. predictLearner.ModelMultiplexer is not passing on the additional parameters...

predictLearner.ModelMultiplexer = function(.learner, .model, .newdata, ...) { predictLearner(bl, .model$learner.model$next.model, .newdata) }

should probably be... predictLearner.ModelMultiplexer = function(.learner, .model, .newdata, ...) { ... predictLearner(bl, .model$learner.model$next.model, .newdata, ...) }

However, the parameters are wrapped with the learner name e.g. "regr.glmnet.s" so they should be cleaned up prior to passing on to predictLearner. Is there already a function to strip the learner name from the parameter?

Thanks, Eric

On 01/06/2016 12:01 PM, Bernd Bischl wrote:

Julia is correct. I am looking at this now

— Reply to this email directly or view it on GitHub https://github.com/mlr-org/mlr/issues/647#issuecomment-169388531.

berndbischl commented 8 years ago

@ecrutchfield Yes, you are perfectly correct. I found exactly the same thing too :) I really, really appreciate proactive help, but I guess in this case it might be a bit too annoying to do that yourself, so I will do this. But please keep up the initiative!

berndbischl commented 8 years ago

If you want to know this probably fixes it:

  args = list(.learner = bl, .model = .model$learner.model$next.model, .newdata = .newdata)
  args = c(args, getHyperPars(bl))
  do.call(predictLearner, args)

the MM is unfortunately a somewhat more complex beast, just because of the annoying name clash thingy. So, what the user passes in are the prefixed names, but inside of the base ensemble, we store a list of the normal learners with their normal param settings. I am just using that above.

I will do a PR now with unit tests.

ecrutchfield commented 8 years ago

Julia, Thanks for pointing me in the right direction! I did ultimately find the true issue but I think someone else is going to apply the fix.

Cheers, Eric

On 01/06/2016 10:10 AM, Julia Schiffner wrote:

Hi, I just read this issue and I'm not sure if we are tackling the core of the problem here.

In the definition of "regr.glmnet" (https://github.com/mlr-org/mlr/blob/master/R/RLearner_regr_glmnet.R) there is a default for |s| already set. If you train and predict "regr.glmnet" this parameter is /automatically/ passed through to |predict| and everything works. So using constructions as |getHyperPars(learner)$s| in |predict| should normally not be necessary.

|library(mlr) library(testthat) lrn = makeLearner("regr.glmnet") lrn # Learner regr.glmnet from package glmnet # Type: regr # Name: GLM with Lasso or Elasticnet Regularization; Short name: glmnet # Class: regr.glmnet # Properties: numerics,factors,ordered,weights # Predict-Type: response # Hyperparameters: s=0.01 holdout(lrn, bh.task)

[Resample] holdout iter: 1 # [Resample] Result: mse.test.mean=27.7

Resample Result # Task: BostonHousing-example # Learner: regr.glmnet # mse.aggr: 27.66 # mse.mean: 27.66 # mse.sd: NA # Runtime: 0.0361211 |

This automatic passing through should also work for the ModelMultiplexer. So in my opinion we should rather look there than fixing single learners. Right? @berndbischl https://github.com/berndbischl @larskotthoff https://github.com/larskotthoff

— Reply to this email directly or view it on GitHub https://github.com/mlr-org/mlr/issues/647#issuecomment-169348684.

ecrutchfield commented 8 years ago

That worked nicely in my code. Thanks for the help!!

Cheers, Eric

On 01/06/2016 12:36 PM, Bernd Bischl wrote:

If you want to know this probably fixes it:

|args = list(.learner = bl, .model = .model$learner.model$next.model, .newdata = .newdata) args = c(args, getHyperPars(bl)) do.call(predictLearner, args) |

the MM is unfortunately a somewhat more complex beast, just because of the annoying name clash thingy. So, the user passes is the prefixed names, but inside of the base ensemble, we store a list of the normal learners with their normal param settings. I am just using that above.

I will do a PR now with unit tests.

— Reply to this email directly or view it on GitHub https://github.com/mlr-org/mlr/issues/647#issuecomment-169398809.

schiffner commented 8 years ago

Julia, Thanks for pointing me in the right direction! I did ultimately find the true issue but I think someone else is going to apply the fix.

Cheers, Eric

@ecrutchfield: Thanks for the report and the work you put into fixing it. Cheers, Julia

berndbischl commented 8 years ago

That worked nicely in my code. Thanks for the help!!

Small add on: This is "more" correct.

Here is the PR:

649

ecrutchfield commented 8 years ago

BTW, here are some details on the problems I had running the existing unit tests.

Running the tests using devtools::test() will crash my R session when it hits the test_classif_lssvm.R tests. I first uninstalled the kernlab package, then re-installed it and was able to run the individual test_case successfully. However, attempting to run the full set of tests again using devtools::test() caused the R session to crash again at test_classif_lssvm.R.

I then added a skip() call to that file but had a similar problem with other files. I added a skip() to each file to see if there was a pattern. The files that caused crashes are: test_classif_PART.R, test_classif_plr.R, and regr_penalized_ridge.R. I gave up after that.

Platform info: R version 3.2.2 (2015-08-14) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 15.10

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] mlr_2.8 ParamHelpers_1.7 ggplot2_2.0.0 BBmisc_1.9 testthat_0.11.0

loaded via a namespace (and not attached): [1] Rcpp_0.12.2 digest_0.6.8 crayon_1.3.1 grid_3.2.2 plyr_1.8.3 gtable_0.1.2 magrittr_1.5 scales_0.3.0 stringi_1.0-1 reshape2_1.4.1 parallelMap_1.3 checkmate_1.6.3 splines_3.2.2 tools_3.2.2 stringr_1.0.0 munsell_0.4.2 survival_2.38-3 parallel_3.2.2
colorspace_1.2-6 memoise_0.2.1

larskotthoff commented 8 years ago

Can you use those learners? Running those tests works fine for me.

ecrutchfield commented 8 years ago

I think so. At least I tried to run test_classif_lssvm by itself and it worked fine. It's only when I try to run the full body of tests that it crashes my R session.

larskotthoff commented 8 years ago

There is a problem when too many packages are loaded and you get a "maximum number of open DLLs exceeded" error, but that shouldn't crash the session.

berndbischl commented 8 years ago

@ecrutchfield Running all tests is rather bothersome. Even for us, although we know how to do this. Running everything, and "making really sure", that is what we use travis for, like Lars told you.

But of course it must be possible to check the topic you are currently working on, before you do the PR, locally. Here are 2 ways to do this.

a) use devtools::test_dir So in your case you can run devtools::test_dir("tests/testthat", filter = "ModelMultiplexer") and later, when that runs, and you wanna check whether your code affected some other parts of mlr you can run this test group "base"

devtools::test_dir("tests/testthat", filter = "base")

Side remarks:

b) I sometimes like to do such tests on the console, with a fully installed devel version of mlr. @mllg wrote the rt tool for this. You can find it here. https://github.com/rdatsci/rt It then allows, in a very similar fashion as above: rtest --filter=ModelMultiplexer rtest --filter=base

berndbischl commented 8 years ago

PR is merged. if @ecrutchfield replies about the tests, we can close this soon

ecrutchfield commented 8 years ago

@berndbischl Thanks for the detailed explanation of your testing approach. Sometimes it's hard knowing what to run to do a thorough regression test when you're new to a project, so I usually start with a complete run as a starting point. I like your approach better! ;)

As far as I'm concerned, this issue is closed.

Thanks again for all your help!

BTW, you all have done an awesome job with this framework. I really love the 'benchmark' capabilities.

Cheers, Eric

berndbischl commented 8 years ago

lars has updated the wiki page wrt local unit testing for contributors