buggy evaluation script

We follow the processing strategy outlined in Video-MME. In practice, if the model has strong instruction-following capabilities, it typically won't generate the preceding analysis, so the impact on the results is minimal. However, if the model you're testing frequently generates analytical content, I recommend using the following evaluation code to skip the prefix portion.

import re

def extract_characters_regex(s, choices):
    s = s.strip()
    answer_prefixes = [
        "The best answer is",
        "The correct answer is",
        "The answer is",
        "The answer",
        "The best option is",
        "The correct option is",
        "Best answer:",
        "Best option:",
        "Answer:",
        "Option:",
        "The correct answer",
        "The correct option",
    ]

    # Find the text after any of the answer prefixes
    for answer_prefix in answer_prefixes:
        prefix_pattern = re.escape(answer_prefix)
        match = re.search(prefix_pattern, s, re.IGNORECASE)
        if match:
            s = s[match.end():].strip()
            break  # Exit the loop once the relevant prefix is found

    # After removing the prefix, continue with the existing logic
    if len(s.split()) > 10 and not re.search("[ABCDE]", s):
        return ""

    matches = re.search(r'[ABCDE]', s)
    if matches is None:
        for choice in choices:
            if s.lower() in choice.lower():
                return choice[1]
        return ""

    return matches[0]

yfzhang114 / MME-RealWorld

buggy evaluation script #1