predict_types should be an active binding

pfistfl commented 2 years ago

Since I am stumbling over this for the nth time:

predict_type and predict_types are easy to confuse (and it happens to me quite a lot).

predict_type: The concrete type of the prediction the learner should yield
predict_types: The theoretical capabilities of the learner: Which types of prediction can it yield. I think for less involved users, this might be an even bigger problem.

Solution:

predict_types should IMHO be immutable (this is a property of the learner). -> Encode as an AB and if the user tries to set it point her/him to predict_type in the error message

More generally: Do we need predict_type?

Is there a single learner that uses it during training?
predict_type is mutable after training so we can break learners if they were to use it
If we e.g. want to mutate predict_types after training for BenchmarkResults this is not possible, anymore due to the use of read-only AB's.
For most cases, it could just be an extra arg added to $predict instead.

Sidenote Default

It's super annoying (and IMHO unncessary) to set predict_type = "prob". I always remember that when I try to use probabilistic measures AFTER having trained the model. And in 99.9% of cases it does not matter what predict type was set for the inducer and I could change it post-hoc (which afaik works as long as the learner is not e.g. in a resample result where things are way more difficult). Question: Should we not by default predict prob IF the learner can do it? Do we have any learners that can not predict prob?

mb706 commented 2 years ago

Is there a single learner that uses it during training?

Grepping a bit:

LearnerClassifSvm gives it as an argument to the model fit, idk how much it slows things down / makes things expensive or if this could just be the default / a hyperparameter.
same story with LearnerClassifRanger. Also it appears to have an effect on defaults?
Also LearnerClassifKSVM in mlr3extralearners.
LearnerRegrRanger sets keep.inbag = TRUE if predict_type is "se"
LearnerClassifGAMBoost and LearnerClassifGLMBoost do some setup if predict_type == "prob"; not sure if it would be a problem to do this by default.
Some learners use $predict_type during .train() to check that hyperparameters are compatible: LearnerClassifXgboost, LearnerRegrEarth

I'd conclude that for the few cases where predict_type is used in training, it could be a hyperparameter -- for xgboost and earth it already is, in a way, since predict_type is only used here to check if the hyperparameters are set up correctly. I think we could therefore just use $predict_type during prediction (or as an argument of $predict(); or both, with the latter overriding the former), throwing errors during prediction in the few cases where different HPs were needed during training.

pfistfl commented 2 years ago

I guess to move forward here we would need the opinion of @berndbischl and @mllg?

mlr-org / mlr3

predict_types should be an active binding #851