Closed ecrutchfield closed 8 years ago
Thanks for the report. If you submit a pull request, Travis will build it automatically and you can check whether anything else breaks this way. Running all of mlr's unit tests is unfortunately not very easy and requires about a million packages to be installed.
Doing a PR is the easiest way to check. You can always modify it later by changing the branch.
Hi, I just read this issue and I'm not sure if we are tackling the core of the problem here.
In the definition of "regr.glmnet" (https://github.com/mlr-org/mlr/blob/master/R/RLearner_regr_glmnet.R) there is a default for s
already set. If you train and predict "regr.glmnet" this parameter is automatically passed through to predict
and everything works.
So using constructions as getHyperPars(learner)$s
in predict
should normally not be necessary.
library(mlr)
library(testthat)
lrn = makeLearner("regr.glmnet")
lrn
# Learner regr.glmnet from package glmnet
# Type: regr
# Name: GLM with Lasso or Elasticnet Regularization; Short name: glmnet
# Class: regr.glmnet
# Properties: numerics,factors,ordered,weights
# Predict-Type: response
# Hyperparameters: s=0.01
holdout(lrn, bh.task)
# [Resample] holdout iter: 1
# [Resample] Result: mse.test.mean=27.7
# Resample Result
# Task: BostonHousing-example
# Learner: regr.glmnet
# mse.aggr: 27.66
# mse.mean: 27.66
# mse.sd: NA
# Runtime: 0.0361211
This automatic passing through should also work for the ModelMultiplexer. So in my opinion we should rather look there than fixing single learners. Right? @berndbischl @larskotthoff
Julia is correct. I am looking at this now
@ecrutchfield And please post what you did to run the unit tests and why it did not work. Then we can make the contributer help page better
I think I found the issue. predictLearner.ModelMultiplexer is not passing on the additional parameters...
predictLearner.ModelMultiplexer = function(.learner, .model, .newdata, ...) { predictLearner(bl, .model$learner.model$next.model, .newdata) }
should probably be... predictLearner.ModelMultiplexer = function(.learner, .model, .newdata, ...) { ... predictLearner(bl, .model$learner.model$next.model, .newdata, ...) }
However, the parameters are wrapped with the learner name e.g. "regr.glmnet.s" so they should be cleaned up prior to passing on to predictLearner. Is there already a function to strip the learner name from the parameter?
Thanks, Eric
On 01/06/2016 12:01 PM, Bernd Bischl wrote:
Julia is correct. I am looking at this now
— Reply to this email directly or view it on GitHub https://github.com/mlr-org/mlr/issues/647#issuecomment-169388531.
@ecrutchfield Yes, you are perfectly correct. I found exactly the same thing too :) I really, really appreciate proactive help, but I guess in this case it might be a bit too annoying to do that yourself, so I will do this. But please keep up the initiative!
If you want to know this probably fixes it:
args = list(.learner = bl, .model = .model$learner.model$next.model, .newdata = .newdata)
args = c(args, getHyperPars(bl))
do.call(predictLearner, args)
the MM is unfortunately a somewhat more complex beast, just because of the annoying name clash thingy. So, what the user passes in are the prefixed names, but inside of the base ensemble, we store a list of the normal learners with their normal param settings. I am just using that above.
I will do a PR now with unit tests.
Julia, Thanks for pointing me in the right direction! I did ultimately find the true issue but I think someone else is going to apply the fix.
Cheers, Eric
On 01/06/2016 10:10 AM, Julia Schiffner wrote:
Hi, I just read this issue and I'm not sure if we are tackling the core of the problem here.
In the definition of "regr.glmnet" (https://github.com/mlr-org/mlr/blob/master/R/RLearner_regr_glmnet.R) there is a default for |s| already set. If you train and predict "regr.glmnet" this parameter is /automatically/ passed through to |predict| and everything works. So using constructions as |getHyperPars(learner)$s| in |predict| should normally not be necessary.
|library(mlr) library(testthat) lrn = makeLearner("regr.glmnet") lrn # Learner regr.glmnet from package glmnet # Type: regr # Name: GLM with Lasso or Elasticnet Regularization; Short name: glmnet # Class: regr.glmnet # Properties: numerics,factors,ordered,weights # Predict-Type: response # Hyperparameters: s=0.01 holdout(lrn, bh.task)
[Resample] holdout iter: 1 # [Resample] Result: mse.test.mean=27.7
Resample Result # Task: BostonHousing-example # Learner: regr.glmnet # mse.aggr: 27.66 # mse.mean: 27.66 # mse.sd: NA # Runtime: 0.0361211 |
This automatic passing through should also work for the ModelMultiplexer. So in my opinion we should rather look there than fixing single learners. Right? @berndbischl https://github.com/berndbischl @larskotthoff https://github.com/larskotthoff
— Reply to this email directly or view it on GitHub https://github.com/mlr-org/mlr/issues/647#issuecomment-169348684.
That worked nicely in my code. Thanks for the help!!
Cheers, Eric
On 01/06/2016 12:36 PM, Bernd Bischl wrote:
If you want to know this probably fixes it:
|args = list(.learner = bl, .model = .model$learner.model$next.model, .newdata = .newdata) args = c(args, getHyperPars(bl)) do.call(predictLearner, args) |
the MM is unfortunately a somewhat more complex beast, just because of the annoying name clash thingy. So, the user passes is the prefixed names, but inside of the base ensemble, we store a list of the normal learners with their normal param settings. I am just using that above.
I will do a PR now with unit tests.
— Reply to this email directly or view it on GitHub https://github.com/mlr-org/mlr/issues/647#issuecomment-169398809.
Julia, Thanks for pointing me in the right direction! I did ultimately find the true issue but I think someone else is going to apply the fix.
Cheers, Eric
@ecrutchfield: Thanks for the report and the work you put into fixing it. Cheers, Julia
That worked nicely in my code. Thanks for the help!!
Small add on: This is "more" correct.
Here is the PR:
BTW, here are some details on the problems I had running the existing unit tests.
Running the tests using devtools::test() will crash my R session when it hits the test_classif_lssvm.R tests. I first uninstalled the kernlab package, then re-installed it and was able to run the individual test_case successfully. However, attempting to run the full set of tests again using devtools::test() caused the R session to crash again at test_classif_lssvm.R.
I then added a skip() call to that file but had a similar problem with other files. I added a skip() to each file to see if there was a pattern. The files that caused crashes are: test_classif_PART.R, test_classif_plr.R, and regr_penalized_ridge.R. I gave up after that.
Platform info: R version 3.2.2 (2015-08-14) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 15.10
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] mlr_2.8 ParamHelpers_1.7 ggplot2_2.0.0 BBmisc_1.9 testthat_0.11.0
loaded via a namespace (and not attached):
[1] Rcpp_0.12.2 digest_0.6.8 crayon_1.3.1 grid_3.2.2 plyr_1.8.3 gtable_0.1.2 magrittr_1.5 scales_0.3.0 stringi_1.0-1 reshape2_1.4.1 parallelMap_1.3 checkmate_1.6.3 splines_3.2.2 tools_3.2.2 stringr_1.0.0 munsell_0.4.2 survival_2.38-3 parallel_3.2.2
colorspace_1.2-6 memoise_0.2.1
Can you use those learners? Running those tests works fine for me.
I think so. At least I tried to run test_classif_lssvm by itself and it worked fine. It's only when I try to run the full body of tests that it crashes my R session.
There is a problem when too many packages are loaded and you get a "maximum number of open DLLs exceeded" error, but that shouldn't crash the session.
@ecrutchfield Running all tests is rather bothersome. Even for us, although we know how to do this. Running everything, and "making really sure", that is what we use travis for, like Lars told you.
But of course it must be possible to check the topic you are currently working on, before you do the PR, locally. Here are 2 ways to do this.
a) use devtools::test_dir So in your case you can run devtools::test_dir("tests/testthat", filter = "ModelMultiplexer") and later, when that runs, and you wanna check whether your code affected some other parts of mlr you can run this test group "base"
devtools::test_dir("tests/testthat", filter = "base")
Side remarks:
b) I sometimes like to do such tests on the console, with a fully installed devel version of mlr. @mllg wrote the rt tool for this. You can find it here. https://github.com/rdatsci/rt It then allows, in a very similar fashion as above: rtest --filter=ModelMultiplexer rtest --filter=base
PR is merged. if @ecrutchfield replies about the tests, we can close this soon
@berndbischl Thanks for the detailed explanation of your testing approach. Sometimes it's hard knowing what to run to do a thorough regression test when you're new to a project, so I usually start with a complete run as a starting point. I like your approach better! ;)
As far as I'm concerned, this issue is closed.
Thanks again for all your help!
BTW, you all have done an awesome job with this framework. I really love the 'benchmark' capabilities.
Cheers, Eric
lars has updated the wiki page wrt local unit testing for contributors
After creating a model multiplexer for regr.glmnet, any attempt to call predict with that learner will generate an error. For example...
This same error is also generated if you call tuneParams().
To fix this you should be able to call glmnet's predict method with the 's' param and it will generate a numeric vector, instead of a matrix. It appears the simple fix is to modify the predictLearner.regr.glmnet method in RLearner_regr_glmnet.R as follows:
Here's a suggested test case for test_tune_ModelMultiplexer.R
I tried forking master and fixing this myself, but I had lot's of problems running the others tests i.e. I kept getting seg. faults and the R session would crash. Anyway, if you'd prefer I submit this fix via a pull request, I can give it a try.
Cheers, Eric