mlr-org / mlr3learners

Recommended learners for mlr3
https://mlr3learners.mlr-org.com
GNU Lesser General Public License v3.0
89 stars 14 forks source link

Feature Request: Registry of learners #81

Closed RaphaelS1 closed 6 days ago

RaphaelS1 commented 4 years ago

It would be nice to have a permanent registry specifically for mlr3learners and mlr3learners.<package> that lists all available learners to install. i.e. like mlr3::mlr_learners except not a dictionary that gets repopulated but instead a permanent list of all available learners that can be installed at any given time. If this was a table like mlr::listLearners() with properties that would be a bonus!

berndbischl commented 4 years ago

so not whats there, currently in memory, but what available for installation? but that function would need access to github? and it would need to download packages and install them just to able to peek inside them?

berndbischl commented 4 years ago

ok, i also just read your other issue #82 the thing is: where would that registry live? and who maintains this how? currently it is very simple for a new party to add an extension learner package for mlr, without talking to us, you simply put it on github.

RaphaelS1 commented 4 years ago

Not necessarily, I'd imagine that this would just be a registry of strings, e.g. like a datatable that is appended once by whoever adds a new learner, so of the form

id package properties
classif.xgboost xgboost ...

It would make sense to live in mlr3learners but could also live in mlr3, it would be quite lightweight...

RaphaelS1 commented 4 years ago

currently it is very simple for a new party to add an extension learner package for mlr, without talking to us, you simply put it on github.

Surely one of you checks this package though and verifies it's up to some level of standard? You could also ask the maintainer of the extension to put in a PR to make sure they are added to the registry. Ultimately if they don't do it then it's their loss

berndbischl commented 4 years ago

well, no, what you are describing is a different process.

a) this is how it is, currently. you write a new learner extension package. you put it somewhere on github. we have given you enough unit-testing tools to demonstrate it works. now maybe the whole mlr3-team is on extended leave. you can still publish your package, everything works.

b) if we do what you propose, we now have to update package X on CRAN (mlr3learners, or mlr3, rather the first?) each time we have to update that table?

OTOH you COULD argue that we are already maintaining the wiki table on github? your whole issue here in one sentence is basically: "why is the wiki table not in machine readable format" correct?

RaphaelS1 commented 4 years ago

I'm not suggesting you push updates to CRAN for each learner. They can wait until the next release.

But yes that is essentially the issue, because when working in R I don't want to go back and forth between GitHub

mllg commented 4 years ago

Machine-readable format:

available.packages(repos = "https://mlr3learners.github.io/mlr3learners.drat")
mllg commented 4 years ago

Not saying that this is very convenient, we should really look into making the additional learners easier to discover and install.

mllg commented 4 years ago

Oh, and if you want properties ... yes, we might need to create a JSON file for this.

berndbischl commented 4 years ago

Machine-readable format:

available.packages(repos = "https://mlr3learners.github.io/mlr3learners.drat")

I would have been nice to have posted the output too.

> options(width = 200)
> available.packages(repos = "https://mlr3learners.github.io/mlr3learners.drat")
                        Package                   Version Priority Depends        Imports                                                          LinkingTo Suggests                               
mlr3learners.C50        "mlr3learners.C50"        "0.1.0" NA       "R (>= 3.1.0)" "C50, data.table, mlr3 (>= 0.1.7), mlr3misc, paradox, R6"        NA        "checkmate, rmarkdown, testthat"       
mlr3learners.c50        "mlr3learners.c50"        "0.1.2" NA       "R (>= 3.1.0)" "C50, data.table, mlr3 (>= 0.1.7), mlr3misc, paradox, R6"        NA        "checkmate, rmarkdown, testthat"       
mlr3learners.extratrees "mlr3learners.extratrees" "0.2.0" NA       "R (>= 3.1.0)" "checkmate, data.table, extraTrees, mlr3, mlr3misc, paradox, R6" NA        "rmarkdown, testthat"                  
mlr3learners.fnn        "mlr3learners.fnn"        "0.2.0" NA       "R (>= 3.1.0)" "checkmate, data.table, FNN, mlr3, mlr3misc, paradox, R6"        NA        "rmarkdown, testthat"                  
mlr3learners.gbm        "mlr3learners.gbm"        "0.1.0" NA       "R (>= 3.1.0)" "data.table, gbm, mlr3, mlr3misc, paradox, R6"                   NA        "checkmate, testthat"                  
mlr3learners.kernlab    "mlr3learners.kernlab"    "0.2.0" NA       "R (>= 3.1.0)" "data.table, kernlab, mlr3, mlr3misc, paradox, R6"               NA        "bibtex, checkmate, testthat"          
mlr3learners.mboost     "mlr3learners.mboost"     "0.3.0" NA       "R (>= 3.1.0)" "data.table, mlr3, mlr3misc, paradox, R6, mboost, withr"         NA        "checkmate, bibtex, testthat"          
mlr3learners.partykit   "mlr3learners.partykit"   "0.2.0" NA       "R (>= 3.1.0)" "data.table, mlr3, mlr3misc, paradox, R6"                        NA        "bibtex, checkmate, partykit, testthat"
                        Enhances License  License_is_FOSS License_restricts_use OS_type Archs MD5sum                             NeedsCompilation File
mlr3learners.C50        NA       "LGPL-3" NA              NA                    NA      NA    "e1a819fb277b7af59ec573f5ec592375" "no"             NA  
mlr3learners.c50        NA       "LGPL-3" NA              NA                    NA      NA    "2fd5ba51ba155ce890d9df31e29aa0e0" "no"             NA  
mlr3learners.extratrees NA       "LGPL-3" NA              NA                    NA      NA    "20763c7a1474efa44ace5f9330255f18" "no"             NA  
mlr3learners.fnn        NA       "LGPL-3" NA              NA                    NA      NA    "ef54c27564a3c571dd626fdeea4dec58" "no"             NA  
mlr3learners.gbm        NA       "LGPL-3" NA              NA                    NA      NA    "0d447219ff12a42b92b3341f0d9068f6" "no"             NA  
mlr3learners.kernlab    NA       "LGPL-3" NA              NA                    NA      NA    "1481447ea6d469e67d6bc333640e0c82" "no"             NA  
mlr3learners.mboost     NA       "LGPL-3" NA              NA                    NA      NA    "94dc921e0c41776cf37a59efd281d6bf" "no"             NA  
mlr3learners.partykit   NA       "LGPL-3" NA              NA                    NA      NA    "4e335cc9c201ab10d90c969709bef746" "no"             NA  
                        Repository                                                    
mlr3learners.C50        "https://mlr3learners.github.io/mlr3learners.drat/src/contrib"
mlr3learners.c50        "https://mlr3learners.github.io/mlr3learners.drat/src/contrib"
mlr3learners.extratrees "https://mlr3learners.github.io/mlr3learners.drat/src/contrib"
mlr3learners.fnn        "https://mlr3learners.github.io/mlr3learners.drat/src/contrib"
mlr3learners.gbm        "https://mlr3learners.github.io/mlr3learners.drat/src/contrib"
mlr3learners.kernlab    "https://mlr3learners.github.io/mlr3learners.drat/src/contrib"
mlr3learners.mboost     "https://mlr3learners.github.io/mlr3learners.drat/src/contrib"
mlr3learners.partykit   "https://mlr3learners.github.io/mlr3learners.drat/src/contrib"

which demonstrates that this does not answer raphael's request? you cannot see the provided learners of the packages. you would have to download and install them. (which might still be the best way to go? in order to keep maintenance lightweight)

berndbischl commented 4 years ago

please note that the package name DOES not coincide with the learner-name. and an extension package can and should contain multiple learners

RaphaelS1 commented 4 years ago

I also think this is too much information for the average user, it really doesn't need to be more than: id, package (+ properties for a bonus)

mllg commented 4 years ago

Ok, then maintaining an extra file seems inevitable.

pat-s commented 4 years ago

The reprex below creates meta-information which could be deployed as text files to mlr3learners.drat (keys.txt or packages.txt) in a daily CRON run. All that is needed is dget(<file.txt> to scrape the respective piece of information.

This information can be used to auto-load/auto-install packages/learners behind the scenes.

The only restriction is that one has internet access - but we could assert this.

So in summary we could have files that store

(This information could also be used to automate the creation of a nice HTML table, similar as we have one in mlr2)

library(mlr3)
library(mlr3learners)
library(mlr3proba)
library(magrittr)

extra_learners <- rownames(available.packages(repos = "https://mlr3learners.github.io/mlr3learners.drat"))
lapply(extra_learners, require, character.only = TRUE, quietly = TRUE)
keys <- mlr_learners$keys()
print(extra_learners)
#> [1] "mlr3learners.C50"        "mlr3learners.c50"       
#> [3] "mlr3learners.extratrees" "mlr3learners.fnn"       
#> [5] "mlr3learners.gbm"        "mlr3learners.kernlab"   
#> [7] "mlr3learners.mboost"     "mlr3learners.partykit"
dput(keys, file = paste0(tempdir(), "/keys.txt"))
dget(file = paste0(tempdir(), "/keys.txt"))
#>  [1] "classif.C5.0"         "classif.ctree"        "classif.debug"       
#>  [4] "classif.extratrees"   "classif.featureless"  "classif.fnn"         
#>  [7] "classif.gamboost"     "classif.gbm"          "classif.glmboost"    
#> [10] "classif.glmnet"       "classif.kknn"         "classif.ksvm"        
#> [13] "classif.lda"          "classif.log_reg"      "classif.naive_bayes" 
#> [16] "classif.qda"          "classif.ranger"       "classif.rpart"       
#> [19] "classif.svm"          "classif.xgboost"      "dens.hist"           
#> [22] "dens.kde"             "dens.kdeKD"           "dens.kdeKS"          
#> [25] "dens.locfit"          "dens.logspline"       "dens.mixed"          
#> [28] "dens.nonpar"          "dens.pen"             "dens.plug"           
#> [31] "dens.spline"          "regr.ctree"           "regr.extratrees"     
#> [34] "regr.featureless"     "regr.fnn"             "regr.gamboost"       
#> [37] "regr.gbm"             "regr.glmboost"        "regr.glmnet"         
#> [40] "regr.kknn"            "regr.km"              "regr.ksvm"           
#> [43] "regr.lm"              "regr.ranger"          "regr.rpart"          
#> [46] "regr.svm"             "regr.xgboost"         "surv.blackboost"     
#> [49] "surv.coxph"           "surv.cvglmnet"        "surv.flexible"       
#> [52] "surv.gamboost"        "surv.gbm"             "surv.glmboost"       
#> [55] "surv.glmnet"          "surv.kaplan"          "surv.mboost"         
#> [58] "surv.nelson"          "surv.obliqueRSF"      "surv.parametric"     
#> [61] "surv.penalized"       "surv.randomForestSRC" "surv.ranger"         
#> [64] "surv.rpart"           "surv.svm"

all_lrns = lrns(keys)
properties = mlr3misc::map(all_lrns, function(.x) .x$properties) %>% 
  setNames(keys)
package = mlr3misc::map(all_lrns, function(.x) .x$packages)
tibble::tibble(name = keys, package = package, properties = properties) 
#> # A tibble: 65 x 3
#>    name                package   properties  
#>    <chr>               <list>    <named list>
#>  1 classif.C5.0        <chr [1]> <chr [4]>   
#>  2 classif.ctree       <chr [1]> <chr [3]>   
#>  3 classif.debug       <chr [0]> <chr [3]>   
#>  4 classif.extratrees  <chr [1]> <chr [3]>   
#>  5 classif.featureless <chr [0]> <chr [5]>   
#>  6 classif.fnn         <chr [1]> <chr [2]>   
#>  7 classif.gamboost    <chr [1]> <chr [2]>   
#>  8 classif.gbm         <chr [1]> <chr [5]>   
#>  9 classif.glmboost    <chr [1]> <chr [2]>   
#> 10 classif.glmnet      <chr [1]> <chr [3]>   
#> # … with 55 more rows

Created on 2020-04-04 by the reprex package (v0.3.0)

sebffischer commented 6 days ago

I guess we can close this @mllg @berndbischl ?