Closed RaphaelS1 closed 3 months ago
so not whats there, currently in memory, but what available for installation? but that function would need access to github? and it would need to download packages and install them just to able to peek inside them?
ok, i also just read your other issue #82 the thing is: where would that registry live? and who maintains this how? currently it is very simple for a new party to add an extension learner package for mlr, without talking to us, you simply put it on github.
Not necessarily, I'd imagine that this would just be a registry of strings, e.g. like a datatable that is appended once by whoever adds a new learner, so of the form
id | package | properties |
---|---|---|
classif.xgboost | xgboost | ... |
It would make sense to live in mlr3learners
but could also live in mlr3
, it would be quite lightweight...
currently it is very simple for a new party to add an extension learner package for mlr, without talking to us, you simply put it on github.
Surely one of you checks this package though and verifies it's up to some level of standard? You could also ask the maintainer of the extension to put in a PR to make sure they are added to the registry. Ultimately if they don't do it then it's their loss
well, no, what you are describing is a different process.
a) this is how it is, currently. you write a new learner extension package. you put it somewhere on github. we have given you enough unit-testing tools to demonstrate it works. now maybe the whole mlr3-team is on extended leave. you can still publish your package, everything works.
b) if we do what you propose, we now have to update package X on CRAN (mlr3learners, or mlr3, rather the first?) each time we have to update that table?
OTOH you COULD argue that we are already maintaining the wiki table on github? your whole issue here in one sentence is basically: "why is the wiki table not in machine readable format" correct?
I'm not suggesting you push updates to CRAN for each learner. They can wait until the next release.
But yes that is essentially the issue, because when working in R I don't want to go back and forth between GitHub
Machine-readable format:
available.packages(repos = "https://mlr3learners.github.io/mlr3learners.drat")
Not saying that this is very convenient, we should really look into making the additional learners easier to discover and install.
Oh, and if you want properties ... yes, we might need to create a JSON file for this.
Machine-readable format:
available.packages(repos = "https://mlr3learners.github.io/mlr3learners.drat")
I would have been nice to have posted the output too.
> options(width = 200)
> available.packages(repos = "https://mlr3learners.github.io/mlr3learners.drat")
Package Version Priority Depends Imports LinkingTo Suggests
mlr3learners.C50 "mlr3learners.C50" "0.1.0" NA "R (>= 3.1.0)" "C50, data.table, mlr3 (>= 0.1.7), mlr3misc, paradox, R6" NA "checkmate, rmarkdown, testthat"
mlr3learners.c50 "mlr3learners.c50" "0.1.2" NA "R (>= 3.1.0)" "C50, data.table, mlr3 (>= 0.1.7), mlr3misc, paradox, R6" NA "checkmate, rmarkdown, testthat"
mlr3learners.extratrees "mlr3learners.extratrees" "0.2.0" NA "R (>= 3.1.0)" "checkmate, data.table, extraTrees, mlr3, mlr3misc, paradox, R6" NA "rmarkdown, testthat"
mlr3learners.fnn "mlr3learners.fnn" "0.2.0" NA "R (>= 3.1.0)" "checkmate, data.table, FNN, mlr3, mlr3misc, paradox, R6" NA "rmarkdown, testthat"
mlr3learners.gbm "mlr3learners.gbm" "0.1.0" NA "R (>= 3.1.0)" "data.table, gbm, mlr3, mlr3misc, paradox, R6" NA "checkmate, testthat"
mlr3learners.kernlab "mlr3learners.kernlab" "0.2.0" NA "R (>= 3.1.0)" "data.table, kernlab, mlr3, mlr3misc, paradox, R6" NA "bibtex, checkmate, testthat"
mlr3learners.mboost "mlr3learners.mboost" "0.3.0" NA "R (>= 3.1.0)" "data.table, mlr3, mlr3misc, paradox, R6, mboost, withr" NA "checkmate, bibtex, testthat"
mlr3learners.partykit "mlr3learners.partykit" "0.2.0" NA "R (>= 3.1.0)" "data.table, mlr3, mlr3misc, paradox, R6" NA "bibtex, checkmate, partykit, testthat"
Enhances License License_is_FOSS License_restricts_use OS_type Archs MD5sum NeedsCompilation File
mlr3learners.C50 NA "LGPL-3" NA NA NA NA "e1a819fb277b7af59ec573f5ec592375" "no" NA
mlr3learners.c50 NA "LGPL-3" NA NA NA NA "2fd5ba51ba155ce890d9df31e29aa0e0" "no" NA
mlr3learners.extratrees NA "LGPL-3" NA NA NA NA "20763c7a1474efa44ace5f9330255f18" "no" NA
mlr3learners.fnn NA "LGPL-3" NA NA NA NA "ef54c27564a3c571dd626fdeea4dec58" "no" NA
mlr3learners.gbm NA "LGPL-3" NA NA NA NA "0d447219ff12a42b92b3341f0d9068f6" "no" NA
mlr3learners.kernlab NA "LGPL-3" NA NA NA NA "1481447ea6d469e67d6bc333640e0c82" "no" NA
mlr3learners.mboost NA "LGPL-3" NA NA NA NA "94dc921e0c41776cf37a59efd281d6bf" "no" NA
mlr3learners.partykit NA "LGPL-3" NA NA NA NA "4e335cc9c201ab10d90c969709bef746" "no" NA
Repository
mlr3learners.C50 "https://mlr3learners.github.io/mlr3learners.drat/src/contrib"
mlr3learners.c50 "https://mlr3learners.github.io/mlr3learners.drat/src/contrib"
mlr3learners.extratrees "https://mlr3learners.github.io/mlr3learners.drat/src/contrib"
mlr3learners.fnn "https://mlr3learners.github.io/mlr3learners.drat/src/contrib"
mlr3learners.gbm "https://mlr3learners.github.io/mlr3learners.drat/src/contrib"
mlr3learners.kernlab "https://mlr3learners.github.io/mlr3learners.drat/src/contrib"
mlr3learners.mboost "https://mlr3learners.github.io/mlr3learners.drat/src/contrib"
mlr3learners.partykit "https://mlr3learners.github.io/mlr3learners.drat/src/contrib"
which demonstrates that this does not answer raphael's request? you cannot see the provided learners of the packages. you would have to download and install them. (which might still be the best way to go? in order to keep maintenance lightweight)
please note that the package name DOES not coincide with the learner-name. and an extension package can and should contain multiple learners
I also think this is too much information for the average user, it really doesn't need to be more than: id, package (+ properties for a bonus)
Ok, then maintaining an extra file seems inevitable.
The reprex below creates meta-information which could be deployed as text files to mlr3learners.drat (keys.txt
or packages.txt
) in a daily CRON run.
All that is needed is dget(<file.txt>
to scrape the respective piece of information.
This information can be used to auto-load/auto-install packages/learners behind the scenes.
The only restriction is that one has internet access - but we could assert this.
So in summary we could have files that store
(This information could also be used to automate the creation of a nice HTML table, similar as we have one in mlr2)
library(mlr3)
library(mlr3learners)
library(mlr3proba)
library(magrittr)
extra_learners <- rownames(available.packages(repos = "https://mlr3learners.github.io/mlr3learners.drat"))
lapply(extra_learners, require, character.only = TRUE, quietly = TRUE)
keys <- mlr_learners$keys()
print(extra_learners)
#> [1] "mlr3learners.C50" "mlr3learners.c50"
#> [3] "mlr3learners.extratrees" "mlr3learners.fnn"
#> [5] "mlr3learners.gbm" "mlr3learners.kernlab"
#> [7] "mlr3learners.mboost" "mlr3learners.partykit"
dput(keys, file = paste0(tempdir(), "/keys.txt"))
dget(file = paste0(tempdir(), "/keys.txt"))
#> [1] "classif.C5.0" "classif.ctree" "classif.debug"
#> [4] "classif.extratrees" "classif.featureless" "classif.fnn"
#> [7] "classif.gamboost" "classif.gbm" "classif.glmboost"
#> [10] "classif.glmnet" "classif.kknn" "classif.ksvm"
#> [13] "classif.lda" "classif.log_reg" "classif.naive_bayes"
#> [16] "classif.qda" "classif.ranger" "classif.rpart"
#> [19] "classif.svm" "classif.xgboost" "dens.hist"
#> [22] "dens.kde" "dens.kdeKD" "dens.kdeKS"
#> [25] "dens.locfit" "dens.logspline" "dens.mixed"
#> [28] "dens.nonpar" "dens.pen" "dens.plug"
#> [31] "dens.spline" "regr.ctree" "regr.extratrees"
#> [34] "regr.featureless" "regr.fnn" "regr.gamboost"
#> [37] "regr.gbm" "regr.glmboost" "regr.glmnet"
#> [40] "regr.kknn" "regr.km" "regr.ksvm"
#> [43] "regr.lm" "regr.ranger" "regr.rpart"
#> [46] "regr.svm" "regr.xgboost" "surv.blackboost"
#> [49] "surv.coxph" "surv.cvglmnet" "surv.flexible"
#> [52] "surv.gamboost" "surv.gbm" "surv.glmboost"
#> [55] "surv.glmnet" "surv.kaplan" "surv.mboost"
#> [58] "surv.nelson" "surv.obliqueRSF" "surv.parametric"
#> [61] "surv.penalized" "surv.randomForestSRC" "surv.ranger"
#> [64] "surv.rpart" "surv.svm"
all_lrns = lrns(keys)
properties = mlr3misc::map(all_lrns, function(.x) .x$properties) %>%
setNames(keys)
package = mlr3misc::map(all_lrns, function(.x) .x$packages)
tibble::tibble(name = keys, package = package, properties = properties)
#> # A tibble: 65 x 3
#> name package properties
#> <chr> <list> <named list>
#> 1 classif.C5.0 <chr [1]> <chr [4]>
#> 2 classif.ctree <chr [1]> <chr [3]>
#> 3 classif.debug <chr [0]> <chr [3]>
#> 4 classif.extratrees <chr [1]> <chr [3]>
#> 5 classif.featureless <chr [0]> <chr [5]>
#> 6 classif.fnn <chr [1]> <chr [2]>
#> 7 classif.gamboost <chr [1]> <chr [2]>
#> 8 classif.gbm <chr [1]> <chr [5]>
#> 9 classif.glmboost <chr [1]> <chr [2]>
#> 10 classif.glmnet <chr [1]> <chr [3]>
#> # … with 55 more rows
Created on 2020-04-04 by the reprex package (v0.3.0)
I guess we can close this @mllg @berndbischl ?
It would be nice to have a permanent registry specifically for
mlr3learners
andmlr3learners.<package>
that lists all available learners to install. i.e. likemlr3::mlr_learners
except not a dictionary that gets repopulated but instead a permanent list of all available learners that can be installed at any given time. If this was a table likemlr::listLearners()
with properties that would be a bonus!