NUM_FEW_SHOT does not correspond with the leaderboard

There seems to be a discrepancy between the leaderboard and this repository, which may end up meaning that models were benchmarked with different settings than reported.

Specifically, m_hellaswag seems to have specified to use 0 few shot examples even though the leaderboard says 10.

https://github.com/laiviet/lm-evaluation-harness/blob/10cb5292748e882c22db7eed49a380089645c4c2/lm_eval/tasks/multilingual_hellaswag.py#L48-L56

Similarly, MMLU has 5 in the leaderboard but 25 in the code.

https://github.com/laiviet/lm-evaluation-harness/blob/10cb5292748e882c22db7eed49a380089645c4c2/lm_eval/tasks/multilingual_mmlu.py#L44-L48

Where should it be corrected - on the leaderboard or in the code? And what are the consequences for the models that you report?

laiviet / lm-evaluation-harness

NUM_FEW_SHOT does not correspond with the leaderboard #3