Closed jasperzhong closed 3 years ago
The Lottery Ticket Hypothesis: A randomly-initialized, dense neural network contains a subnetwork that is initialized such that—when trained in isolation—it can match the test accuracy of the original network after training for at most the same number of iterations.
模型压缩可以去掉90%的参数,得到一个subnetwork. 一般的过程是
re-trained后的subnetwork可以做到no accuracy loss. 自然而然会问一个问题——能不能直接训练这个小的subnetwork呢?
彩票假说的回答是:No.
为什么呢?等同于在问为什么我们一定要先训练一个大模型呢?这是因为大模型是over-parameterized的,这样就让大模型可以去eploit巨大的subnetwork的组合空间,去找到一个good initialized subnetwork. 这个subnetwork就叫做winning tickets(中奖彩票). 但是当中奖彩票randomly reinitilized的时候,然后重新训练,性能却很差,说明这个subnetwork只有appropriately initialized的时候才能有效地训练. 所以我们如果直接训练subnetwork,参数都是随机初始化的,是没法达到我们想要的性能.
所以他们提出了一个识别winning tickets的办法. 他们的剪枝策略是iterative pruning,即不断地train, prune, reset the network. 每轮剪枝p^{1/n} %.
彩票猜想: Dense, randomly-initialized networks are easier to train than the sparse networks that result from pruning because there are more possible subnetworks from which training might recover a winning ticket.
这篇paper说明了:我们还是要训练大模型,部署前压缩成小模型.
https://arxiv.org/pdf/1803.03635.pdf
ICLR '19 best paper.
看的这个视频 https://www.youtube.com/watch?v=ZVVnvZdUMUk