Closed pat-s closed 4 years ago
After inspecting {kknn}, especially kknn::kknn()
, in detail I understand why things are how they are. The package does not really comply to the normal R standards for model fitting. However, we should have documented this both internally and for the user.
(First I though we could make use of kknn::train.knnn()
but this performs LOOCV and does return a different result object...)
So currently we are doing everything (train and predict) in .predict()
. This leads to the issue that the $model
slot is read-only and we can't store the actually fitted model.
In addition, the $model
slot contains the args which are used to fit the model and is actually a false-positive.
Let's consider what options we have here:
$model
slot -> we can get all of this info also within .predict()
and do not need to carry it over from .train()
@berndbischl Your opinion?
i am not sure understand the issue. you, @pat-s, do understand that knn (the algorithm) in its basic form really does NOT have a training phase? this has nothing to do with the implementation of kknn. yes, everything happens in the predict-phase. we store nearly an empty model - except for a little bit of meta-info.
Let's consider what options we have here:
* At least do not store the false-positive information shown above in the `$model` slot -> we can get all of this info also within `.predict()` and do not need to carry it over from `.train()`
It is technically possible to retrieve the training data from the task passed to .predict()
, but this would be a rather ugly hack. IMHO it is much cleaner to just store the data.
* Should we think about using a different package for the knn learner? The unorthodox implementation really causes a lot of troubles and inconsistencies that we cannot work around.
I don't see where this could cause trouble. The kknn learner in mlr is doing exactly the same. Also note that other packages, e.g. FNN, do not compute a "model".
So currently we are doing everything (train and predict) in
.predict()
. This leads to the issue that the$model
slot is read-only and we can't store the actually fitted model.
It is possible to store result of kknn()
in self$state$model
during predict()
.
IMHO it is much cleaner to just store the data.
also for KNN the data-table are the parameters. so that's actually correct
you, @pat-s, do understand that knn (the algorithm) in its basic form really does NOT have a training phase? this has nothing to do with the implementation of kknn. yes, everything happens in the predict-phase.
Ah ok my bad then! Was not aware of this. Never looked into knn details.
I don't see where this could cause trouble. The kknn learner in mlr is doing exactly the same. Also note that other packages, e.g. FNN, do not compute a "model".
ok thanks. I thought that it may cause trouble for functions operating on $model
after a train()
call. In such cases $model
does not contain a fitted model.
Even though this is caused by the nature of the algorithm, shouldn't we doc this at least for the user?
Even though this is caused by the nature of the algorithm, shouldn't we doc this at least for the user?
you can add a super short sentence, what we do. but this is not exactly our "baustelle"
Fixed now so you can at least access whatever was returned by kknn during predict()
.
https://github.com/mlr-org/mlr3learners/blob/eb223fc855ef59f7f840a2b262a21377ff2a47f1/R/LearnerClassifKKNN.R#L48-L54
Because
kknn::kknn
gets called in the predict method.@mllg Is there a reason for this extraordinary structure? This does not align with all others learners and checking the
$model
slot of a trained kknn model returns a list of task metadata and not a fitted kknn model. It seems that it was already implement like this in https://github.com/mlr-org/mlr3learners/pull/14.Created on 2020-04-04 by the reprex package (v0.3.0)