Evaluation issue with downstream evaluation codes

kakaobrain / kogpt

KakaoBrain KoGPT (Korean Generative Pre-trained Transformer)

Other

1.01k stars 138 forks source link

Models	#params	method	NSMC (Acc.)	KorSTS(spearman)
SKT-AI/KoGPT-2 2.0[2]	125M	`finetuning`	93.3	78.4
SKT-AI/KoGPT-2 Trinity[3]	1.2B	`finetuning`	93.2	83.4
HyperCLOVA[1]	1.3B	`p-tuning`	91.7	-
HyperCLOVA[1]	39.0B	`p-tuning`	93.0	-
Ours	6.0B	`finetuning`	95.7	85.3

Models

#params

method

NSMC (Acc.)

KorSTS(spearman)

SKT-AI/KoGPT-2 2.0[2]

125M

finetuning

93.3

78.4

SKT-AI/KoGPT-2 Trinity[3]

1.2B

finetuning

93.2

83.4

HyperCLOVA[1]

1.3B

p-tuning

91.7

HyperCLOVA[1]

39.0B

p-tuning

93.0

Ours

6.0B

finetuning

95.7

85.3

Models	#params	method	NSMC (Acc.)
SKT-AI/KoGPT-2 2.0[2]	125M	`finetuning`	89.0*
SKT-AI/KoGPT-2 Trinity[3]	1.2B	`finetuning`	91.1*
HyperCLOVA[1]	1.3B	`p-tuning`	91.7
HyperCLOVA[1]	39.0B	`p-tuning`	93.0
Ours	6.0B	`finetuning`	91.7

We conducted this experiments using [4], with same hyper-parameters as modified code.
* indicates that the modified code was re-experimented on a publicly available pre-trained GPT model.

[1] HyperCLOVA: Kim, Boseop, et al. "What changes can large-scale language models bring? intensive study on hyperclova: Billions-scale korean generative pretrained transformers." arXiv preprint arXiv:2109.04650 (2021).
[2] SKT-AI/KoGPT-2 2.0: "SKT-AI/KoGPT2: Korean GPT-2 pretrained cased (KoGPT2)." https://github.com/SKT-AI/KoGPT2 (2021).
[3] SKT-AI/KoGPT-2 Trinity: "Ko-GPT-Trinity 1.2B." https://huggingface.co/skt/ko-gpt-trinity-1.2B-v0.5 (2021). [4] KoGPT2-subtasks: "KoGPT2 v2.0 한국어 평가 모듈" https://github.com/haven-jeon/KoGPT2-subtasks (2021).

kakaobrain / kogpt