Open wenhuchen opened 4 months ago
Hi @wenhuchen , what is your sampling configuration? I've seen the sampling configuration can affect these evals in the past.
@pengchongjin , I'm using the Opencompass and our previous MAmmoTH eval script. They both reach around 20% on MATH. would you mind sharing a version that reproduces 24%?
Hi there, thanks for sharing gemma. But it seems that I can't reproduce the MATH 24% 4-shot accuracy. I'm only getting 20% now. Is there anyone trying to reproduce that? What's the prompt?