nlp-uoregon / mlmm-evaluation

Multilingual Large Language Models Evaluation Benchmark
Apache License 2.0
109 stars 18 forks source link

Few Shot configuration #12

Open Nkluge-correa opened 3 months ago

Nkluge-correa commented 3 months ago

Hello!

Is there a way to control how many examples are used to evaluate the models? Also, how are the evaluations currently set up? Are all benchmarks (ARC, MMLU, HellaSwag) running in a zero-shot fashion? If not, what is the configuration used?