mlr-org / mlr3pipelines

Dataflow Programming for Machine Learning in R
https://mlr3pipelines.mlr-org.com/
GNU Lesser General Public License v3.0
137 stars 25 forks source link

getting variable importance measure (iml) from a resample result using pipes. #685

Closed gbiele closed 2 years ago

gbiele commented 2 years ago

Hi, thanks for all the work on the ml3 package eco system!

I am trying to calculate variable importance as described here after fitting an xgboost model along these lines:

task = as_task_classif(dt, task_type = "classif", target = "Y")

my_pipe =
  po("scale") %>>%
  po("encode") %>>%
  po("learner", learner =  lrn("classif.xgboost"))

G_learner = GraphLearner$new(my_pipe)

rr = tune_nested(
        task = task,
        learner = G_learner,
        ...
      )

This call returns a "ResampleResult" R6 object. But I am unable to figure out how to calculate variables importance as described here or to find the LearnerClassifXgboost object in order to use the importance method as described here.

I also found this stackoverflow question & answer and based on it unsucessfully tried to calculate variable importance from an autotuner object in the ResampleResult rr.

So my question is: How can I calculate variable importance given a ResampleResult?

gbiele commented 2 years ago

Finally figured it out following the pattern desribed here.

  1. Get an AutoTune object from the ResampleResult object rr: autotuner = rr$learners[[1]]
  2. Extract the GraphLearner object from the AutoTuner object graphlearner = autotuner$learner
  3. Get the LearnerClassifXgboost object from graphlearner: xgboostlearner = graphlearner$graph$pipeops$classif.xgboost$learner
  4. Fill the LearnerClassifXgboost object xgboostlearner with the fitted model: xgboostlearner$state = graphlearner$model$classif.xgboost
  5. Get variable importance xgboostlearner$importance()