jasperzhong / read-papers-and-code

My paper/code reading notes in Chinese
45 stars 3 forks source link

arXiv '20 | Language Models are Few-Shot Learners #56

Open jasperzhong opened 4 years ago

jasperzhong commented 4 years ago

https://arxiv.org/abs/2005.14165

gpt-3

from: xxxx公众号 知乎

jasperzhong commented 4 years ago

看完了intro,首先批判了一番pretrain + finetune的模式,说这样不够灵活,可能还有泛化问题etc

然后说few-shot才是nb的,但是之前结果都不好,他们说这是因为模型不够大,所以他们弄了一个大模型试下,结果发现还不错,甚至可以接近sota

注意few-shot是不需要做gradient update,只需要condition on描述和样例

其实我好奇是1750亿参数的模型是怎么训练出来的