decoding parameters (e.g., temperature) for Gemma-2?

iseesaw commented 2 months ago

Hello, How should I set the decoding parameters (e.g., temperature) for Gemma-2? My result is about ~50.0, far from the benchmark of 76.

xiamengzhou commented 2 months ago

Hi, please refer to the parameters in this script: https://github.com/tatsu-lab/alpaca_eval/blob/main/src/alpaca_eval/models_configs/gemma-2-9b-it-SimPO/configs.yaml

MaoXinn commented 2 months ago

Hi, i also met this problem. I only got WR/LC as follows: 54.47204968944099,59.969975205397596

here is my evaluation config:

Gemma-2-Aligned-simpo: completions_kwargs: batch_size: 900 max_new_tokens: 4096 model_kwargs: dtype: bfloat16 model_name: princeton-nlp/gemma-2-9b-it-SimPO stop_token_ids:

1
107 temperature: 0.5 top_p: 1.0 fn_completions: vllm_local_completions pretty_name: gemma-2-9b-it-SimPO prompt_template: ./eval_config/gemma2_prompt.txt

The only different is that i remove "do_sample: true".

I reviewed your config and your conversation with the AE author on GitHub, and now I’m quite confused. Even after downgrading AE2 to 0.62, I still couldn’t run it based on the configuration you provided. The main problem seems to lie with beam search. Should I enable beam search? If so, the temperature must be set to 0, but I don’t know what the beam size should be.

Thank you~

LotuSrc commented 2 months ago

Maybe you use alpaca_eval_gpt4_turbo_fn. In this setting, the result is close to the result you reported.

xiamengzhou commented 1 month ago

@MaoXinn It’s a bit tricky to interpret what happened based on the information you provided. How about we troubleshoot it step by step? You could begin by running the evaluations with the outputs we provided on AlpacaEval and check if you can get a similar score first.

princeton-nlp / SimPO

decoding parameters (e.g., temperature) for Gemma-2? #64