lmarena / arena-hard-auto

Arena-Hard-Auto: An automatic LLM benchmark.
Apache License 2.0
606 stars 71 forks source link

[Bug] Temperature is always `0.0` #18

Closed bcui19 closed 5 months ago

bcui19 commented 5 months ago

The temperature for answer generations is always 0.0, since the category of all questions is arena-hard-v0.1. This implicitly means that the temperature will be defaulted to 0.0. Wanted to flag since I think this might be undesirable behavior.

CodingWithTim commented 5 months ago

Hi, this is not a bug but an intended parameter for Arena Hard.

Since most of the prompts are difficult problem solving oriented questions (many of them being very technical), we use temperature 0 so models can produce more accurate answer. Further, we want to limit the variance of the win rate as well. Thanks.