kakaobrain / kogpt

KakaoBrain KoGPT (Korean Generative Pre-trained Transformer)
Other
1.01k stars 138 forks source link

Evaluation issue with downstream evaluation codes #17

Closed wbaek closed 2 years ago

wbaek commented 2 years ago

We have been reported to have issues with our downstream evaluation due to issues such as the following link. https://github.com/haven-jeon/KoGPT2-subtasks/pull/1

We investigated the range that affects the problem, and it was confirmed that there was only a problem with the NSMC finetuning accuracy among the following evaluation tables.

Models #params method NSMC (Acc.) KorSTS(spearman)
SKT-AI/KoGPT-2 2.0[2] 125M finetuning 93.3 78.4
SKT-AI/KoGPT-2 Trinity[3] 1.2B finetuning 93.2 83.4
HyperCLOVA[1] 1.3B p-tuning 91.7 -
HyperCLOVA[1] 39.0B p-tuning 93.0 -
Ours 6.0B finetuning 95.7 85.3

We plan to share the evaluation results that solved the problem as soon as possible.

wbaek commented 2 years ago
Models #params method NSMC (Acc.)
SKT-AI/KoGPT-2 2.0[2] 125M finetuning 89.0*
SKT-AI/KoGPT-2 Trinity[3] 1.2B finetuning 91.1*
HyperCLOVA[1] 1.3B p-tuning 91.7
HyperCLOVA[1] 39.0B p-tuning 93.0
Ours 6.0B finetuning 91.7

[1] HyperCLOVA: Kim, Boseop, et al. "What changes can large-scale language models bring? intensive study on hyperclova: Billions-scale korean generative pretrained transformers." arXiv preprint arXiv:2109.04650 (2021).
[2] SKT-AI/KoGPT-2 2.0: "SKT-AI/KoGPT2: Korean GPT-2 pretrained cased (KoGPT2)." https://github.com/SKT-AI/KoGPT2 (2021).
[3] SKT-AI/KoGPT-2 Trinity: "Ko-GPT-Trinity 1.2B." https://huggingface.co/skt/ko-gpt-trinity-1.2B-v0.5 (2021). [4] KoGPT2-subtasks: "KoGPT2 v2.0 한국어 평가 모듈" https://github.com/haven-jeon/KoGPT2-subtasks (2021).