Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of language models (https://arxiv.org/abs/2211.09110). This framework is also used to evaluate text-to-image models in Holistic Evaluation of Text-to-Image Models (HEIM) (https://arxiv.org/abs/2311.04287).
Some instruction-tuned models will try to answer the first question (i.e. the in-context learning question) instead of the last one. This change adds a way for the output_format_instructions run expander to tell the models to answer the last question.
Some instruction-tuned models will try to answer the first question (i.e. the in-context learning question) instead of the last one. This change adds a way for the
output_format_instructions
run expander to tell the models to answer the last question.