Closed wbaek closed 2 years ago
Models | #params | method | NSMC (Acc.) |
---|---|---|---|
SKT-AI/KoGPT-2 2.0[2] | 125M | finetuning |
89.0* |
SKT-AI/KoGPT-2 Trinity[3] | 1.2B | finetuning |
91.1* |
HyperCLOVA[1] | 1.3B | p-tuning |
91.7 |
HyperCLOVA[1] | 39.0B | p-tuning |
93.0 |
Ours | 6.0B | finetuning |
91.7 |
[1] HyperCLOVA: Kim, Boseop, et al. "What changes can large-scale language models bring? intensive study on hyperclova: Billions-scale korean generative pretrained transformers." arXiv preprint arXiv:2109.04650 (2021).
[2] SKT-AI/KoGPT-2 2.0: "SKT-AI/KoGPT2: Korean GPT-2 pretrained cased (KoGPT2)." https://github.com/SKT-AI/KoGPT2 (2021).
[3] SKT-AI/KoGPT-2 Trinity: "Ko-GPT-Trinity 1.2B." https://huggingface.co/skt/ko-gpt-trinity-1.2B-v0.5 (2021).
[4] KoGPT2-subtasks: "KoGPT2 v2.0 한국어 평가 모듈" https://github.com/haven-jeon/KoGPT2-subtasks (2021).
We have been reported to have issues with our downstream evaluation due to issues such as the following link. https://github.com/haven-jeon/KoGPT2-subtasks/pull/1
We investigated the range that affects the problem, and it was confirmed that there was only a problem with the NSMC finetuning accuracy among the following evaluation tables.
finetuning
finetuning
p-tuning
p-tuning
finetuning
We plan to share the evaluation results that solved the problem as soon as possible.