mosaicml / llm-foundry

LLM training code for Databricks foundation models
https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm
Apache License 2.0
4k stars 525 forks source link

Triviaqa metrics wrong! #1557

Open lqniunjunlper opened 1 week ago

lqniunjunlper commented 1 week ago

When i test public DCLM-7B(https://huggingface.co/apple/DCLM-7B) model on triviaqa small subset, the metrics is so low.

Eval metrics/triviaqa_sm_sub/0-shot/InContextLearningGenerationExactMatchAccuracy: 0.0003

yaml config: label: triviaqa_sm_sub dataset_uri: eval/local_data/world_knowledge/triviaqa_sm_sub.jsonl num_fewshot: [0, 3, 5] icl_task_type: generation_task_with_answers do_normalization: true

Samples: {"context": "Question: What media mogul, known as The Mouth of the South, started the first dedicated 24-hour cable news channel, owns the Atlanta braves, founded the Goodwill Games, and married Hanoi Jane?\nAnswer:", "answer": "Ted Turner", "aliases": ["Robert Edward Turner", "Billionaire Ted", "Robert Edward III Turner", "R.E. Turner", "Ted Turner", "Robert Edward Turner III", "Ted Turner Foundation", "Turner Foundation", "Former owner of WCW"]}

Is this low score caused by aliases which are not permitted? How to use aliases of answer?

Thanks!

lqniunjunlper commented 1 week ago

llm_foundry version is 0.10.0