mlr-org / mlr3learners

Recommended learners for mlr3
https://mlr3learners.mlr-org.com
GNU Lesser General Public License v3.0
91 stars 16 forks source link

kknn: No model is stored in `$model` after train & predict #86

Closed pat-s closed 4 years ago

pat-s commented 4 years ago

https://github.com/mlr-org/mlr3learners/blob/eb223fc855ef59f7f840a2b262a21377ff2a47f1/R/LearnerClassifKKNN.R#L48-L54

Because kknn::kknn gets called in the predict method.

@mllg Is there a reason for this extraordinary structure? This does not align with all others learners and checking the $model slot of a trained kknn model returns a list of task metadata and not a fitted kknn model. It seems that it was already implement like this in https://github.com/mlr-org/mlr3learners/pull/14.

library(mlr3learners)
library(mlr3)
lrn_kknn = lrn("classif.kknn")
lrn_kknn$train(tsk("iris"))

lrn_kknn$model
#> $formula
#> Species ~ .
#> NULL
#> 
#> $data
#>        Species Petal.Length Petal.Width Sepal.Length Sepal.Width
#>   1:    setosa          1.4         0.2          5.1         3.5
#>   2:    setosa          1.4         0.2          4.9         3.0
#>   3:    setosa          1.3         0.2          4.7         3.2
#>   4:    setosa          1.5         0.2          4.6         3.1
#>   5:    setosa          1.4         0.2          5.0         3.6
#>  ---                                                            
#> 146: virginica          5.2         2.3          6.7         3.0
#> 147: virginica          5.0         1.9          6.3         2.5
#> 148: virginica          5.2         2.0          6.5         3.0
#> 149: virginica          5.4         2.3          6.2         3.4
#> 150: virginica          5.1         1.8          5.9         3.0
#> 
#> $pars
#> named list()

Created on 2020-04-04 by the reprex package (v0.3.0)

pat-s commented 4 years ago

After inspecting {kknn}, especially kknn::kknn(), in detail I understand why things are how they are. The package does not really comply to the normal R standards for model fitting. However, we should have documented this both internally and for the user.

(First I though we could make use of kknn::train.knnn() but this performs LOOCV and does return a different result object...)

So currently we are doing everything (train and predict) in .predict(). This leads to the issue that the $model slot is read-only and we can't store the actually fitted model. In addition, the $model slot contains the args which are used to fit the model and is actually a false-positive.

Let's consider what options we have here:

@berndbischl Your opinion?

berndbischl commented 4 years ago

i am not sure understand the issue. you, @pat-s, do understand that knn (the algorithm) in its basic form really does NOT have a training phase? this has nothing to do with the implementation of kknn. yes, everything happens in the predict-phase. we store nearly an empty model - except for a little bit of meta-info.

mllg commented 4 years ago

Let's consider what options we have here:

* At least do not store the false-positive information shown above in the `$model` slot -> we can get all of this info also within `.predict()` and do not need to carry it over from `.train()`

It is technically possible to retrieve the training data from the task passed to .predict(), but this would be a rather ugly hack. IMHO it is much cleaner to just store the data.

* Should we think about using a different package for the knn learner? The unorthodox implementation really causes a lot of troubles and inconsistencies that we cannot work around.

I don't see where this could cause trouble. The kknn learner in mlr is doing exactly the same. Also note that other packages, e.g. FNN, do not compute a "model".

mllg commented 4 years ago

So currently we are doing everything (train and predict) in .predict(). This leads to the issue that the $model slot is read-only and we can't store the actually fitted model.

It is possible to store result of kknn() in self$state$model during predict().

mllg commented 4 years ago

Prototype:

https://github.com/mlr-org/mlr3learners/pull/87

berndbischl commented 4 years ago

IMHO it is much cleaner to just store the data.

also for KNN the data-table are the parameters. so that's actually correct

pat-s commented 4 years ago

you, @pat-s, do understand that knn (the algorithm) in its basic form really does NOT have a training phase? this has nothing to do with the implementation of kknn. yes, everything happens in the predict-phase.

Ah ok my bad then! Was not aware of this. Never looked into knn details.

I don't see where this could cause trouble. The kknn learner in mlr is doing exactly the same. Also note that other packages, e.g. FNN, do not compute a "model".

ok thanks. I thought that it may cause trouble for functions operating on $model after a train() call. In such cases $model does not contain a fitted model.

Even though this is caused by the nature of the algorithm, shouldn't we doc this at least for the user?

berndbischl commented 4 years ago

Even though this is caused by the nature of the algorithm, shouldn't we doc this at least for the user?

you can add a super short sentence, what we do. but this is not exactly our "baustelle"

mllg commented 4 years ago

Fixed now so you can at least access whatever was returned by kknn during predict().