Open iseesaw opened 2 months ago
Hi, please refer to the parameters in this script: https://github.com/tatsu-lab/alpaca_eval/blob/main/src/alpaca_eval/models_configs/gemma-2-9b-it-SimPO/configs.yaml
Hi, i also met this problem. I only got WR/LC as follows: 54.47204968944099,59.969975205397596
here is my evaluation config:
Gemma-2-Aligned-simpo: completions_kwargs: batch_size: 900 max_new_tokens: 4096 model_kwargs: dtype: bfloat16 model_name: princeton-nlp/gemma-2-9b-it-SimPO stop_token_ids:
The only different is that i remove "do_sample: true".
I reviewed your config and your conversation with the AE author on GitHub, and now I’m quite confused. Even after downgrading AE2 to 0.62, I still couldn’t run it based on the configuration you provided. The main problem seems to lie with beam search. Should I enable beam search? If so, the temperature must be set to 0, but I don’t know what the beam size should be.
Thank you~
Maybe you use alpaca_eval_gpt4_turbo_fn. In this setting, the result is close to the result you reported.
@MaoXinn It’s a bit tricky to interpret what happened based on the information you provided. How about we troubleshoot it step by step? You could begin by running the evaluations with the outputs we provided on AlpacaEval and check if you can get a similar score first.
Hello, How should I set the decoding parameters (e.g., temperature) for Gemma-2? My result is about ~50.0, far from the benchmark of 76.