why mlr3 randomforest importance is different from randomForest package

slecee commented 10 months ago

Description I think the two methods get the same importance, but the results are not the same...

Reproducible example

tasks = as_task_classif(iris, target = 'Species')
learners = lrn("classif.randomForest" ,predict_type = "prob",importance= c('gini'))
set.seed(123, kind = "Mersenne-Twister")
split = partition(tasks)
split
a = learners$train(tasks,row_ids = split$train)
a$model$importance

  setosa  versicolor  virginica MeanDecreaseAccuracy MeanDecreaseGini
Petal.Length 0.318035851 0.266299298 0.30394373          0.291450470        29.166804
Petal.Width  0.343312702 0.285824160 0.24060252          0.287471876        29.279836
Sepal.Length 0.044516441 0.017425256 0.03575046          0.032833987         7.045327
Sepal.Width  0.007014524 0.009913653 0.00355260          0.006423498         1.783527

set.seed(123, kind = "Mersenne-Twister")
tmp <- randomForest(iris[split$train,1:4], 
                    iris$Species[split$train], 
                    importance = TRUE)
tmp[["importance"]]
setosa versicolor   virginica MeanDecreaseAccuracy MeanDecreaseGini
Sepal.Length 0.028980891 0.01123579 0.041315479          0.027508393         6.729108
Sepal.Width  0.007498441 0.01019763 0.006658933          0.008441336         1.853704
Petal.Length 0.300187608 0.25654441 0.310950138          0.285067350        28.924225
Petal.Width  0.361163535 0.29720214 0.250040471          0.299030008        29.750160

be-marc commented 10 months ago

You call partition() after setting the seed. This function already uses your seed to sample random splits. The importance values are equal when you move the partition() call.

library(mlr3extralearners)

tasks = as_task_classif(iris, target = 'Species')
learners = lrn("classif.randomForest" ,predict_type = "prob",importance= c('gini'))
split = partition(tasks)
split
set.seed(123, kind = "Mersenne-Twister")
a = learners$train(tasks,row_ids = split$train)
a$model$importance

library(randomForest)

set.seed(123, kind = "Mersenne-Twister")
tmp <- randomForest(iris[split$train,1:4], 
                    iris$Species[split$train], 
                    importance = TRUE)

tmp[["importance"]]

slecee commented 10 months ago

the results also differ... results

mlr-org / mlr3

why mlr3 randomforest importance is different from randomForest package #974