Open Nkluge-correa opened 3 months ago
Hello!
Is there a way to control how many examples are used to evaluate the models? Also, how are the evaluations currently set up? Are all benchmarks (ARC, MMLU, HellaSwag) running in a zero-shot fashion? If not, what is the configuration used?
Hello!
Is there a way to control how many examples are used to evaluate the models? Also, how are the evaluations currently set up? Are all benchmarks (ARC, MMLU, HellaSwag) running in a zero-shot fashion? If not, what is the configuration used?