Triviaqa metrics wrong!

When i test public DCLM-7B(https://huggingface.co/apple/DCLM-7B) model on triviaqa small subset, the metrics is so low.

Eval metrics/triviaqa_sm_sub/0-shot/InContextLearningGenerationExactMatchAccuracy: 0.0003

yaml config: label: triviaqa_sm_sub dataset_uri: eval/local_data/world_knowledge/triviaqa_sm_sub.jsonl num_fewshot: [0, 3, 5] icl_task_type: generation_task_with_answers do_normalization: true

Samples: {"context": "Question: What media mogul, known as The Mouth of the South, started the first dedicated 24-hour cable news channel, owns the Atlanta braves, founded the Goodwill Games, and married Hanoi Jane?\nAnswer:", "answer": "Ted Turner", "aliases": ["Robert Edward Turner", "Billionaire Ted", "Robert Edward III Turner", "R.E. Turner", "Ted Turner", "Robert Edward Turner III", "Ted Turner Foundation", "Turner Foundation", "Former owner of WCW"]}

Is this low score caused by aliases which are not permitted? How to use aliases of answer?

Thanks!

mosaicml / llm-foundry

Triviaqa metrics wrong! #1557