jasperzhong / read-papers-and-code

My paper/code reading notes in Chinese
45 stars 3 forks source link

ICLR '19 | The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks #214

Closed jasperzhong closed 3 years ago

jasperzhong commented 3 years ago

https://arxiv.org/pdf/1803.03635.pdf

ICLR '19 best paper.

看的这个视频 https://www.youtube.com/watch?v=ZVVnvZdUMUk

jasperzhong commented 3 years ago

The Lottery Ticket Hypothesis: A randomly-initialized, dense neural network contains a subnetwork that is initialized such that—when trained in isolation—it can match the test accuracy of the original network after training for at most the same number of iterations.

模型压缩可以去掉90%的参数,得到一个subnetwork. 一般的过程是

  1. train original network
  2. prune
  3. re-train subnetwork (用original network训练好的参数初始化)

re-trained后的subnetwork可以做到no accuracy loss. 自然而然会问一个问题——能不能直接训练这个小的subnetwork呢?

彩票假说的回答是:No.

为什么呢?等同于在问为什么我们一定要先训练一个大模型呢?这是因为大模型是over-parameterized的,这样就让大模型可以去eploit巨大的subnetwork的组合空间,去找到一个good initialized subnetwork. 这个subnetwork就叫做winning tickets(中奖彩票). 但是当中奖彩票randomly reinitilized的时候,然后重新训练,性能却很差,说明这个subnetwork只有appropriately initialized的时候才能有效地训练. 所以我们如果直接训练subnetwork,参数都是随机初始化的,是没法达到我们想要的性能.

所以他们提出了一个识别winning tickets的办法. 他们的剪枝策略是iterative pruning,即不断地train, prune, reset the network. 每轮剪枝p^{1/n} %. image

彩票猜想: Dense, randomly-initialized networks are easier to train than the sparse networks that result from pruning because there are more possible subnetworks from which training might recover a winning ticket.

这篇paper说明了:我们还是要训练大模型,部署前压缩成小模型.