stanford-crfm / helm

Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of language models (https://arxiv.org/abs/2211.09110). This framework is also used to evaluate text-to-image models in Holistic Evaluation of Text-to-Image Models (HEIM) (https://arxiv.org/abs/2311.04287).
https://crfm.stanford.edu/helm
Apache License 2.0
1.8k stars 239 forks source link

list of the questions used for each of the MMLU subsets #2673

Closed surya-narayanan closed 2 months ago

surya-narayanan commented 2 months ago

Hi,

How did you choose the 5 questions used in the context window? I see they're different for each subset, and was wondering if there's a json file somewhere I can use.

Regards, Surya

yifanmai commented 2 months ago

Hi Surya, The 5 questions are sampled from the train split using the code here. We don't have a JSON file for the 5 examples, but you can extract them from the raw text from the MMLU leaderboards, or run the code yourself and export the examples.

surya-narayanan commented 2 months ago

Thanks!