Unable to reproduce MATH resulst

google / gemma_pytorch

The official PyTorch implementation of Google's Gemma models

https://ai.google.dev/gemma

Apache License 2.0

5.19k stars 492 forks source link

Unable to reproduce MATH resulst #50

Open wenhuchen opened 4 months ago

wenhuchen commented 4 months ago

Hi there, thanks for sharing gemma. But it seems that I can't reproduce the MATH 24% 4-shot accuracy. I'm only getting 20% now. Is there anyone trying to reproduce that? What's the prompt?

pengchongjin commented 4 months ago

Hi @wenhuchen , what is your sampling configuration? I've seen the sampling configuration can affect these evals in the past.

wenhuchen commented 4 months ago

@pengchongjin , I'm using the Opencompass and our previous MAmmoTH eval script. They both reach around 20% on MATH. would you mind sharing a version that reproduces 24%?