Closed AdamSobieski closed 1 month ago
Thanks for the suggestions! StrategyQA is very relevant to HELM for assessing implicit reasoning so I think adding it would be a good idea.
Closing this issue due to staleness - feel free to reopen if you're planning to work on StrategyQA further.
Hello. I am interested in the evaluation of AI systems and LLMs both in general and specifically with respect to reading comprehension, story comprehension (e.g., NarrativeQA), question-answering, and question-answering strategies.
With respect to AI evaluation, in general, I find interesting the forefront R&D topics of:
With respect to question-answering strategies, in particular, has the StrategyQA dataset been considered for HELM? Thank you.
References
[1] Laverghetta Jr, Antonio, and John Licato. "Generating better items for cognitive assessments using large language models." (2023).
[2] Olney, Andrew M. "Generating multiple choice questions from a textbook: LLMs match human performance on most metrics." In AIED Workshops. 2023.