krystalan / chatgpt_as_nlg_evaluator

Technical Report: Is ChatGPT a Good NLG Evaluator? A Preliminary Study
https://arxiv.org/abs/2303.04048
41 stars 1 forks source link

request for more details #1

Open Shen-Chenhui opened 1 year ago

Shen-Chenhui commented 1 year ago

Hi, could you please kindly provide more details regarding experimental settings? Specifically for SummEval,

krystalan commented 1 year ago

Hi, Sorry for the really late reply.

The OpenAI ChatGPT did not release the official API when we did the experiments. Thus, there might be gaps when you reproduce the results using the official API.

Currently, I recommend setting the temperature to zero in official APIs and using the gpt-3.5-turbo model. Empirically, I find that when setting the temperature to zero, the gpt-3.5-turbo model will directly produce the final scores without any explanations. If you want to collect explanations, try to raise the temperature.

Please feel free to drop me emails for any other questions.