open-compass / opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
https://opencompass.org.cn/
Apache License 2.0
4.15k stars 438 forks source link

How to set temperature when use llm as judge? #1379

Closed may012345 closed 3 months ago

may012345 commented 3 months ago

Describe the feature

Do I need to set temperature = 0 when I try to use llm as judge. Otherwise, every time the score is different.

Will you implement it?

bittersweet1999 commented 3 months ago

Yes, in subjective evaluation, we often set temperature=1 when doing model inference, but set temerature=0 when use llm as judge

tonysy commented 3 months ago

Feel free to re-open if needed.