Essential Learners - Githubissues

mllg commented 5 years ago

Here is a list of essential learners and their respective implementation for discussion:

[x] Featureless. In mlr3.
[x] Classification and regression trees: rpart (in mlr3). I'd like to keep this in mlr3 as rpart is shipped with R and I need it to run some basic tests and examples.
[x] Linear / Logistic Reg: lm(), glm
[x] Penalized regression: glmnet
[x] kNN: kknn.
[x] Naive Bayes: e1071.
[x] SVM: e1071
[x] Random Forest: ranger.
[x] Boosting: xgboost.
[x] Kriging: DiceKriging.
[ ] Neural Network: ?

Please, share your thoughts on the implementations and what is missing in this list.

@berndbischl @jakob-r @ja-thomas @larskotthoff @pat-s @Coorsaa @florianfendt @giuseppec @mb706 @zzawadz

pat-s commented 5 years ago

Penalized regression -> glmnet ?

ja-thomas commented 5 years ago

Good multinomial model is really important, but I don't really know which package implements this (except for dnn frameworks).

Also, do we care for heavy dependencies here? Most neural net frameworks come with a lot

berndbischl commented 5 years ago

Glm: glmnet offers penalties so I would prefer that to get ridge and lasso. It has also become pretty standard. Also note that we need to cover enough regression, especially simple LMs.

SVMs: both are fine but libsvm is more standard.

Knn: why not use package kknn? More flexible.

berndbischl commented 5 years ago

Lda. Imho not needed. But doesn't hurt I guess

berndbischl commented 5 years ago

Neural networks is either keras or nothing? @ja-thomas

ja-thomas commented 5 years ago

Keras has Really heavy dependencies (Python!) etc. Kind of annoying

berndbischl commented 5 years ago

Keras has Really heavy dependencies (Python!) etc. Kind of annoying

better alternative or nothing?

mllg commented 5 years ago

Glm: glmnet offers penalties so I would prefer that to get ridge and lasso. It has also become pretty standard. Also note that we need to cover enough regression, especially simple LMs.

AFAIK, although you can set the penalty term to 0, you don't get hypothesis tests for the betas from glmnet. So glmnet cannot replace linear / logistic regression.

Nevertheless, I agree that penalized regression is important and added to the list.

ja-thomas commented 5 years ago

I don't think there is a better alternative. But I would suggest to not encode possible architectures in the param set but only have one architecture parameter.

This would allow to tune over different optimizers/ learning rates etc. (everything which is set in the compile function), but not over the number, size and type of layers. Otherwise the paramset will get quite complex.

When we depend on keras anyways I would suggest to directly implement the multinomial model as a keras model. Or does anybody know a good package in R? There is mnlogit but I don't have any experience using it

ja-thomas commented 5 years ago

+1 for kknn

ja-thomas commented 5 years ago

What about

AdaBoost
Linear models by SGD (regression, binary, multiclass)
Featureless learner
1R
configurable one layer NNet (compare https://scikit-learn.org/stable/modules/neural_networks_supervised.html)

mllg commented 5 years ago

Featureless is in mlr3. About the other models @ja-thomas suggested: I don't consider them to be essential. More like nice to have. We should concentrate on the others first. Agreed?

ja-thomas commented 5 years ago

Fine with me, except maybe the Linear models by SGD, but if we implement the multiclass linear model (which we definitely want imo.) this is "free"

berndbischl commented 5 years ago

the conbfigurable 1layer NN is very nice. but this doesnt work if we dont have a package (or it is again keras)

ja-thomas commented 5 years ago

that would be keras again,

Linear SGD models, architecture based NN and configurable 1/2 layer NN would all be done in Keras.

mb706 commented 5 years ago

What about nnet for single layer neural nets? Also there are neuralnet and deepnet which are not perfect but may be a compromise for anns without large dependencies.

berndbischl commented 5 years ago

Martin: I don't like all of those. Nnet is really not modern and the other 2 very slow. In an "essential" package they don't fit IMHO

mllg commented 5 years ago

@Coorsaa Is this list up-to-date? IIRC you had a google doc with a table of learners you want to implement...

Coorsaa commented 5 years ago

Yes, this list was basically based on the one above in this issue. It could/should be updated though to the present implementation status.

berndbischl commented 5 years ago

if you maintain other stuff can you please crosslink this / and or clean up

ja-thomas commented 5 years ago

Multinomial model and Neuralnetwork is done by keras and will go in a seperate package, e.g., mlr3keras or mlr3deeplearning

mllg commented 5 years ago

We now have all essential learners (not all important hyperparamters are in the parameter sets though). Neural networks will probably go into an extra package.

mlr-org / mlr3learners

Essential Learners #4