AI2 reasoning challenge

stanford-crfm / helm

Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of language models (https://arxiv.org/abs/2211.09110). This framework is also used to evaluate text-to-image models in HEIM (https://arxiv.org/abs/2311.04287) and vision-language models in VHELM (https://arxiv.org/abs/2410.07112).

https://crfm.stanford.edu/helm

Apache License 2.0

1.96k stars 254 forks source link

AI2 reasoning challenge #1695

Open SuryaThiru opened 1 year ago

SuryaThiru commented 1 year ago

ARC is a challenging QA dataset. I was wondering if there are any reasons why it wasn't included as part of the available datasets. Are there any plans to add it in the future?

yifanmai commented 1 year ago

We don't have any plans to add AI2 at the moment, but you are welcome to open a pull request to add it!

We do have other scenarios for question answering using commonsense reasoning, including OpenBookQA and CommonSenseQA.