Open Randl opened 7 months ago
The paper evaluates on ARC, HellaSwag, MMLU, and TruthfulQA, but this repo does not reference these evals. Adding short explanation regarding these evals (e.g., in https://github.com/huggingface/alignment-handbook/tree/main/scripts#evaluating-chat-models) would be nice
The paper evaluates on ARC, HellaSwag, MMLU, and TruthfulQA, but this repo does not reference these evals. Adding short explanation regarding these evals (e.g., in https://github.com/huggingface/alignment-handbook/tree/main/scripts#evaluating-chat-models) would be nice