Open DerryChan opened 4 weeks ago
Hi @DerryChan, unfortunately this is something we don't plan to support for the default built-in MMLU scenario.
Some suggestions that you could try for your use case:
max_tokens
parameter, which is set 1 for to MMLU. This will usually cause the model to only output the letter.output_format_instructions=mmlu
to your run entry (e.g. mmlu:output_format_instructions=mmlu,model=text
) will add "Answer with only a single letter."
to the prompt.
Description
During the MMLU evaluation of our LLM, we encountered an issue where correct answers are being marked as incorrect due to format mismatches. Specifically, when the model outputs the correct numerical answer but includes a preceding letter (e.g., "C. 12" instead of just "12"), the scoring system fails to recognize it as correct.
Current Behavior
The scoring system marks answers as incorrect if they don't exactly match the reference answer format, even if the numerical value is correct.
Expected Behavior
The scoring system should be able to correctly identify and score answers that are numerically correct, regardless of minor formatting differences such as preceding letters.