mlr-org / mlr3

mlr3: Machine Learning in R - next generation
https://mlr3.mlr-org.com
GNU Lesser General Public License v3.0
948 stars 85 forks source link

Using exports from mlr3fselect during multisession execution fails due to non-loaded package #1183

Closed skysyzygy closed 1 month ago

skysyzygy commented 1 month ago

mlr3fselect adds always_include to mlr_reflection$col_roles on package load.

This causes failures when training in with future::plan("multisession"), as the parallel workers don't seem to load mlr3fselect and complain about the existence of always_include in the col_roles for the task.

Error in .__Task__col_roles(self = self, private = private, super = super,  : 
  Assertion on 'names of col_roles' failed: Names must be a permutation of set {'feature','target','name','order','stratum','group','weight'}, but has extra elements {'always_included'}.
This happened PipeOp char_to_fct's $train()

Here is a working (sequential) MWE:

library(mlr3verse)
future::plan("sequential")

task <- tsk("zoo")
learner <- po("select") %>>% ppl("robustify") %>>% lrn("classif.rpart")
resample(task,learner,rsmp("cv"))

And a failing (multisession) MWE:

library(mlr3verse)
future::plan("multisession")

task <- tsk("zoo")
learner <- po("select") %>>% ppl("robustify") %>>% lrn("classif.rpart")
resample(task,learner,rsmp("cv"))

Here are the package versions I'm using

> mlr3verse::mlr3verse_info()
Key: <package>
             package version
              <char>  <char>
 1:            bbotk   1.0.1
 2:      mlr3cluster   0.1.9
 3:         mlr3data   0.7.0
 4:      mlr3filters   0.8.0
 5:      mlr3fselect   1.1.0
 6:    mlr3hyperband   0.6.0
 7:     mlr3learners   0.7.0
 8:          mlr3mbo   0.2.4
 9:         mlr3misc  0.15.1
10:    mlr3pipelines   0.7.0
11:       mlr3tuning   1.0.0
12: mlr3tuningspaces   0.5.1
13:          mlr3viz   0.9.0
14:          paradox   1.0.1
be-marc commented 1 month ago

Hey, sorry for the late reply. Our team was on vacation. Thanks for reporting this bug. A workaround should be to load only the required packages.

library(mlr3)
library(mlr3pipelines)
skysyzygy commented 1 month ago

Thanks for getting back! It's actually happening in a package, i.e. without explicit imports so not sure how to implement this workaround?

From what I can gather the issue is that po("select") causes an import from mlr3fselect, which has an .onLoad that modifies col_roles. For some reason this isn't happening in future workers though?

be-marc commented 1 month ago

Yes, it has something to do with it. When mlr3verse is loaded, mlr3fselect is also loaded which adds a new col_role. However, mlr3fselect does not appear in your workflow, which is why mlr3fselect is not loaded on the worker. There is then an error on the worker because the new col_role of the task is not known.

There is fix now. You can test the new versions with pak::pak(c("mlr-org/mlr3", "mlr-org/mlr3fselect")).

skysyzygy commented 1 month ago

😲 so fast and an elegant fix thank you!

Sky Syzygy Pronouns: She/They Associate Director of Analytics & Insights BAM (Brooklyn Academy of Music) … Peter Jay Sharp Building 30 Lafayette Ave, Brooklyn, NY 11217 … P: 718.636.4194 x 8234 E: @.**@.> … Help adventurous art thrive in Brooklyn! BAM.org/supporthttps://www.bam.org/support?utm_source=internal&utm_medium=email-x&utm_content=mem-internal-support-e-x&utm_campaign=mem

[https://lh6.googleusercontent.com/H-0wlRLvbmb9YABxfZGGmRl9YQU8d1A-wn8ch4YdLDwnQkP4sg6lleX5obu_00ZHvkr5pumGinQf7rjfwAnvKvUgWBrLrA00jp6PVnWrPsv52-JBzUf3y-iEGVqg0chK6MH1JSTo]https://www.bam.org/support?utm_source=internal&utm_medium=email-x&utm_content=mem-internal-support-e-x&utm_campaign=mem


From: Marc Becker @.> Sent: Friday, October 18, 2024 1:13 AM To: mlr-org/mlr3 @.> Cc: Sky Syzygy @.>; Author @.> Subject: Re: [mlr-org/mlr3] Using exports from mlr3fselect during multisession execution fails due to non-loaded package (Issue #1183)

Yes, it has something to do with it. When mlr3verse is loaded, mlr3fselect is also loaded which adds a new col_role. However, mlr3fselect does not appear in your workflow, which is why mlr3fselect is not loaded on the worker. There is then an error on the worker because the new col_role of the task is not known.

There is fix now. You can test the new versions with pak::pak(c("mlr-org/mlr3, "mlr-org/mlr3fselect")).

— Reply to this email directly, view it on GitHubhttps://github.com/mlr-org/mlr3/issues/1183#issuecomment-2421394708, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ANAFCX7IRSYOTHO4QX6DRTLZ4CKJBAVCNFSM6AAAAABO72D7R2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRRGM4TINZQHA. You are receiving this because you authored the thread.Message ID: @.***>