ICLR '19 | The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks

The Lottery Ticket Hypothesis: A randomly-initialized, dense neural network contains a subnetwork that is initialized such that—when trained in isolation—it can match the test accuracy of the original network after training for at most the same number of iterations.

模型压缩可以去掉90%的参数，得到一个subnetwork. 一般的过程是

train original network
prune
re-train subnetwork （用original network训练好的参数初始化）

re-trained后的subnetwork可以做到no accuracy loss. 自然而然会问一个问题——能不能直接训练这个小的subnetwork呢？

彩票假说的回答是：No.

为什么呢？等同于在问为什么我们一定要先训练一个大模型呢？这是因为大模型是over-parameterized的，这样就让大模型可以去eploit巨大的subnetwork的组合空间，去找到一个good initialized subnetwork. 这个subnetwork就叫做winning tickets（中奖彩票）. 但是当中奖彩票randomly reinitilized的时候，然后重新训练，性能却很差，说明这个subnetwork只有appropriately initialized的时候才能有效地训练. 所以我们如果直接训练subnetwork，参数都是随机初始化的，是没法达到我们想要的性能.

所以他们提出了一个识别winning tickets的办法. 他们的剪枝策略是iterative pruning，即不断地train, prune, reset the network. 每轮剪枝p^{1/n} %.

彩票猜想: Dense, randomly-initialized networks are easier to train than the sparse networks that result from pruning because there are more possible subnetworks from which training might recover a winning ticket.

这篇paper说明了：我们还是要训练大模型，部署前压缩成小模型.

jasperzhong / read-papers-and-code

ICLR '19 | The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks #214