基于mlr3工具包的机器学习（4）：重采样、基准测试

基于mlr3工具包的机器学习（4）：重采样、基准测试 by R语言学堂

专注系列化、高质量的R语言教程

推文索引 | 联系小编 | 咨询服务

本系列已发推文：

本篇参考资料：

https://mlr3book.mlr-org.com/chapters/chapter3/evaluation_and_benchmarking.html

本篇目录如下：

1 重采样

1.1 重采样方法
1.2 执行重采样
1.3 预测结果
1.4 模型评估
1.5 分组和分层

2 基准测试

1 重采样

在前两篇推文里，我们都将原始数据随机划分为训练集和测试集，其中训练集用于模型训练，测试集用于模型评估。划分训练集和测试集的操作叫重采样（resampling）。

1.1 重采样方法

重采样有许多方法，如留出法（holdout）、k折交叉验证（k-fold cross-validation）、子采样法（subsampling）、自举法（bootstrap）等。已经介绍的partition()函数使用的就是留出法。

mlr3内置的重采样方法可在mlr_resamplings中查看：

library(mlr3)

mlr_resamplings
## <DictionaryResampling> with 9 stored values
## Keys: bootstrap, custom, custom_cv, cv, holdout, insample, loo,
##   repeated_cv, subsampling

将其转换成表格可查看重采样方法的参数：

as.data.table(mlr_resamplings)
##            key                         label        params iters
## 1:   bootstrap                     Bootstrap ratio,repeats    30
## 2:      custom                 Custom Splits                  NA
## 3:   custom_cv Custom Split Cross-Validation                  NA
## 4:          cv              Cross-Validation         folds    10
## 5:     holdout                       Holdout         ratio     1
## 6:    insample           Insample Resampling                   1
## 7:         loo                 Leave-One-Out                  NA
## 8: repeated_cv     Repeated Cross-Validation folds,repeats   100
## 9: subsampling                   Subsampling ratio,repeats    30

下面简述几种重采样方法及其差异：

留出法是将样本按一定比例随机分为测试集和训练集（默认2:1）；
k折交叉验证是将样本等分为份（默认10份），每次使用份作为测试集，剩余1份作为训练集，因此会训练出个中间模型；
留一法交叉验证是交叉验证的特例，它的等于样本量，因此每次测试集只有1个样本；
子采样法相当于将留出法重复多次（默认30次），因此也会训练出多个中间模型；它与k折交叉验证的区别是，不同测试集会有重叠；
自举法与子采样法的区别是，在抽取训练集样本时使用的是有放回抽样，因此一个样本可能会在训练集中多次出现；自举法的抽样也可以重复多次（默认30次），因此也会训练出多个中间模型。

每种重采样方法的官方帮助文档都可在mlr包的mlr_resamplings_*条目查看，其中*为重采样方法的名称，如mlr_resamplings_holdout。

关于交叉验证方法还可参考如下推文：

交叉验证

1.2 执行重采样

对学习任务执行重采样分为两个步骤，涉及两个函数。第一步是构建重采样方法，使用rsmp()函数；第二步是将重采样方法应用于学习任务，使用resample()函数。

构建一些重采样方法：

set.seed(0117)

## 留出法
holdout = rsmp("holdout", ratio = 0.8)

## 交叉验证
cv = rsmp("cv", folds = 5)

## 子采样法
subsmp = rsmp("subsampling", ratio = 0.8, repeats = 5)

## 自举法
bootstrap = rsmp("bootstrap", ratio = 0.8, repeats = 5)

每种重采样方法的设置参数见as.data.table(mlr_resamplings)输出结果的params列。

将重采样方法应用于学习任务：

data("mtcars")
## 任务和学习器
task = as_task_classif(mtcars, target = "vs", 
                       positive = "1")
learn = lrn("classif.rpart", predict_type = "prob")

## 将重采样方法应用于学习任务
r1 = resample(task, learn, resampling = holdout)
## INFO  [19:20:20.870] [mlr3] Applying learner 'classif.rpart' on task 'mtcars' (iter 1/1)

r2 = resample(task, learn, resampling = cv)
## INFO  [19:20:20.960] [mlr3] Applying learner 'classif.rpart' on task 'mtcars' (iter 1/5)
## INFO  [19:20:21.028] [mlr3] Applying learner 'classif.rpart' on task 'mtcars' (iter 2/5)
## INFO  [19:20:21.044] [mlr3] Applying learner 'classif.rpart' on task 'mtcars' (iter 3/5)
## INFO  [19:20:21.062] [mlr3] Applying learner 'classif.rpart' on task 'mtcars' (iter 4/5)
## INFO  [19:20:21.081] [mlr3] Applying learner 'classif.rpart' on task 'mtcars' (iter 5/5)

r3 = resample(task, learn, resampling = subsmp)
## INFO  [19:20:21.124] [mlr3] Applying learner 'classif.rpart' on task 'mtcars' (iter 1/5)
## INFO  [19:20:21.142] [mlr3] Applying learner 'classif.rpart' on task 'mtcars' (iter 2/5)
## INFO  [19:20:21.160] [mlr3] Applying learner 'classif.rpart' on task 'mtcars' (iter 3/5)
## INFO  [19:20:21.179] [mlr3] Applying learner 'classif.rpart' on task 'mtcars' (iter 4/5)
## INFO  [19:20:21.194] [mlr3] Applying learner 'classif.rpart' on task 'mtcars' (iter 5/5)

r4 = resample(task, learn, resampling = bootstrap)
## INFO  [21:38:28.642] [mlr3] Applying learner 'classif.rpart' on task 'mtcars' (iter 1/5)
## INFO  [21:38:28.663] [mlr3] Applying learner 'classif.rpart' on task 'mtcars' (iter 2/5)
## INFO  [21:38:28.682] [mlr3] Applying learner 'classif.rpart' on task 'mtcars' (iter 3/5)
## INFO  [21:38:28.701] [mlr3] Applying learner 'classif.rpart' on task 'mtcars' (iter 4/5)
## INFO  [21:38:28.721] [mlr3] Applying learner 'classif.rpart' on task 'mtcars' (iter 5/5)

除留出法外，其他方法都有5个迭代结果。

1.3 预测结果

在重采样过程中，resample()函数会一次性完成模型训练和模型预测，而不是像上篇推文那样先使用train()方法训练，再使用predict()方法预测。

在重采样中，使用训练集训练出的模型称为“中间模型”，resample()还会使用全部数据训练一个“最终模型”。在得到重采样结果后，使用predictions()方法可查看各个中间模型在对应测试集上的预测结果：

r3$predictions()
## [[1]]
## <PredictionClassif> for 6 observations:
##  row_ids truth response     prob.1    prob.0
##        1     0        0 0.06666667 0.9333333
##        8     1        1 1.00000000 0.0000000
##       10     1        1 1.00000000 0.0000000
##       17     0        0 0.06666667 0.9333333
##       29     0        0 0.06666667 0.9333333
##       30     0        0 0.06666667 0.9333333
## 
## [[2]]
## <PredictionClassif> for 6 observations:
##  row_ids truth response prob.1 prob.0
##        1     0        1      1      0
##        2     0        1      1      0
##       16     0        0      0      1
##       23     0        0      0      1
##       27     0        1      1      0
##       28     1        1      1      0
## 
## ....

共5个预测结果，这里只展示前两个。

使用prediction()方法可查看所有迭代结果的“组合预测”：

r3$prediction()
## <PredictionClassif> for 30 observations:
##     row_ids truth response     prob.1    prob.0
##           1     0        0 0.06666667 0.9333333
##           8     1        1 1.00000000 0.0000000
##          10     1        1 1.00000000 0.0000000
## ---                                            
##          23     0        0 0.06250000 0.9375000
##          27     0        0 0.06250000 0.9375000
##          31     0        0 0.06250000 0.9375000

1.4 模型评估

使用指标评估重采样结果会展示所有中间模型的评估分数：

r1$score(msr("classif.ce"))
##    task_id    learner_id resampling_id iteration classif.ce
## 1:  mtcars classif.rpart       holdout         1  0.3333333
## Hidden columns: task, learner, resampling, prediction

r2$score(msr("classif.ce"))
##    task_id    learner_id resampling_id iteration classif.ce
## 1:  mtcars classif.rpart            cv         1  0.0000000
## 2:  mtcars classif.rpart            cv         2  0.0000000
## 3:  mtcars classif.rpart            cv         3  0.0000000
## 4:  mtcars classif.rpart            cv         4  0.0000000
## 5:  mtcars classif.rpart            cv         5  0.1666667
## Hidden columns: task, learner, resampling, prediction

r3$score(msr("classif.ce"))
##    task_id    learner_id resampling_id iteration classif.ce
## 1:  mtcars classif.rpart   subsampling         1        0.0
## 2:  mtcars classif.rpart   subsampling         2        0.5
## 3:  mtcars classif.rpart   subsampling         3        0.0
## 4:  mtcars classif.rpart   subsampling         4        0.0
## 5:  mtcars classif.rpart   subsampling         5        0.0
## Hidden columns: task, learner, resampling, prediction

r4$score(msr("classif.ce"))
##    task_id    learner_id resampling_id iteration classif.ce
## 1:  mtcars classif.rpart     bootstrap         1 0.13333333
## 2:  mtcars classif.rpart     bootstrap         2 0.16666667
## 3:  mtcars classif.rpart     bootstrap         3 0.00000000
## 4:  mtcars classif.rpart     bootstrap         4 0.06666667
## 5:  mtcars classif.rpart     bootstrap         5 0.25000000
## Hidden columns: task, learner, resampling, prediction

使用aggregate()字段可计算评估指标的平均值，默认为“宏平均”，即所有中间模型评估分数的平均值：

r2$aggregate(msr("classif.ce"))
## classif.ce 
## 0.03333333

“微平均”是按测试集的样本数给中间模型评估分数加权后得到的平均值。因为训练集和测试集有时不能恰好满足指定比例，所以不同之间模型的测试集样本数可能不同，从而导致宏平均与微平均有略微差异：

r2$aggregate(msr("classif.ce", average = "micro"))
## classif.ce 
##    0.03125

1.5 分组和分层

在重采样过程中，有时还要考虑样本的分组（grouping）或分层（stratification）。

分组重采样是指，拥有某一共同特征的样本必须同时被划分到训练集或测试集里去。分组方法是将某特征变量设置为分组角色：

## 备份任务
task2 = task$clone()

## 指定分组角色
task2$set_col_roles(cols = "cyl", roles = "group")

r21 = resample(task2, learn, resampling = holdout)

## 测试集
testid = as.data.table(r21$prediction())$row_ids
mtcars[testid, "cyl"]
##  [1] 8 8 8 8 8 8 8 8 8 8 8 8 8 8

反复运行上面的代码，可以发现测试集的cyl列始终相同。

分层重抽样是指，拥有某一共同特征的样本必须以同比例分布在训练集和测试集中。分层重抽样尤其针对目标变量。分层方法是将变量设置为分层角色：

## 备份任务
task3 = task$clone()

## 指定分组角色
task3$set_col_roles(cols = "vs", roles = c("stratum", "target"))
task3$col_roles
## $feature
##  [1] "am"   "carb" "cyl"  "disp" "drat" "gear" "hp"   "mpg"  "qsec" "wt"  
## 
## $target
## [1] "vs"
## 
## $name
## character(0)
## 
## $order
## character(0)
## 
## $stratum
## [1] "vs"
## 
## $group
## character(0)
## 
## $weight
## character(0)

vs变量既是目标变量，也是分层变量。

2 基准测试

基准测试（benchmark）就是将不同学习器应用于同一个或几个任务，并使用同一个或几个重采样方法，然后使用同一个或几个评估指标来比较学习效果的过程。

使用benchmark_grid()函数设计基准测试：

library(mlr3learners)
task = as_task_classif(mtcars, target = "vs", 
                       positive = "1")
learns = lrns(c("classif.rpart", "classif.ranger", "classif.kknn"), 
              predict_type = "prob")

cv = rsmp("cv", folds = 5)

design = benchmark_grid(tasks = task,
                        learners = learns,
                        resamplings = cv)

使用benchmark()函数执行基准测试：

bmr = benchmark(design)

计算评估分数：

bmr$score(msr("classif.ce"))
##     nr task_id     learner_id resampling_id iteration classif.ce
##  1:  1  mtcars  classif.rpart            cv         1  0.1428571
##  2:  1  mtcars  classif.rpart            cv         2  0.0000000
##  3:  1  mtcars  classif.rpart            cv         3  0.0000000
##  4:  1  mtcars  classif.rpart            cv         4  0.0000000
##  5:  1  mtcars  classif.rpart            cv         5  0.0000000
##  6:  2  mtcars classif.ranger            cv         1  0.1428571
##  7:  2  mtcars classif.ranger            cv         2  0.0000000
##  8:  2  mtcars classif.ranger            cv         3  0.0000000
##  9:  2  mtcars classif.ranger            cv         4  0.1666667
## 10:  2  mtcars classif.ranger            cv         5  0.1666667
## 11:  3  mtcars   classif.kknn            cv         1  0.0000000
## 12:  3  mtcars   classif.kknn            cv         2  0.0000000
## 13:  3  mtcars   classif.kknn            cv         3  0.0000000
## 14:  3  mtcars   classif.kknn            cv         4  0.3333333
## 15:  3  mtcars   classif.kknn            cv         5  0.0000000
## Hidden columns: uhash, task, learner, resampling, prediction

bmr$aggregate(msr("classif.ce"))
##    nr task_id     learner_id resampling_id iters classif.ce
## 1:  1  mtcars  classif.rpart            cv     5 0.02857143
## 2:  2  mtcars classif.ranger            cv     5 0.09523810
## 3:  3  mtcars   classif.kknn            cv     5 0.06666667
## Hidden columns: resample_result

通过基准测试，我们可以一次性运行多个学习模型并进行比较。

ixxmu / mp_duty