openai / simple-evals

MIT License
1.35k stars 114 forks source link

Question: Why is Claude running with temperature 0 and GPT4 with temperature 0.5? wouldn't that be a handicap for Claude in Humaneval? #13

Open zurferr opened 1 month ago

js-d commented 1 week ago

I'm also confused by this. Why not use temperature = 1 for both models, since that's the default temperature on the APIs?

Could you link to the "Anthropic example" you have in mind in this comment?

temperature: float = 0.0, # default in Anthropic example

Some additional references:

In all cases where evaluations involve free-form sampling, we evaluate our models at temperature T = 1 to represent normal usage, unless indicated otherwise.

All the results in Table 3 have been obtained by sampling at temperature T = 0.

We test both Claude 3 and Claude 2 model families in 0-shot and 1-shot settings, sampled with temperature T = 1.