mlr-org / mlr3pipelines

Dataflow Programming for Machine Learning in R
https://mlr3pipelines.mlr-org.com/
GNU Lesser General Public License v3.0
137 stars 25 forks source link

Some PipeOps need their own hash function #720

Closed sebffischer closed 1 year ago

sebffischer commented 1 year ago

The two learners below have the same hash although they should not. The problem is that PipeOpLearner inherits the hash function from PipeOp which does not take the hash of the learner that is passed during construction into account.

library(mlr3verse)
#> Loading required package: mlr3

at1 = auto_tuner(
  learner = lrn("regr.rpart"),
  measure = msr("regr.mse"),
  term_evals = 10,
  tuner = tnr("random_search"),
  resampling = rsmp("cv", folds = 3)
)

at2 = auto_tuner(
  learner = lrn("regr.rpart"),
  measure = msr("regr.mse"),
  term_evals = 10,
  tuner = tnr("random_search"),
  resampling = rsmp("holdout")
)

g1 = as_learner(ppl("robustify") %>>% at1)
g2 = as_learner(ppl("robustify") %>>% at2)

g1$hash
#> [1] "631881f0a872501a"
g2$hash
#> [1] "631881f0a872501a"

Created on 2023-05-21 with reprex v2.0.2

sebffischer commented 1 year ago

I think in general the hash function of the pipeop should check whether the PipeOp has non-standard construction arguments and if it does it should throw an error so one does not inherit a wrong hash method

mb706 commented 1 year ago

To the degree that the auto_tuner does not do its own hashing properly there is nothing mlr3pipelines can realistically do.