Codex for math scenarios

stanford-crfm / helm

Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of language models (https://arxiv.org/abs/2211.09110). This framework is also used to evaluate text-to-image models in Holistic Evaluation of Text-to-Image Models (HEIM) (https://arxiv.org/abs/2311.04287).

https://crfm.stanford.edu/helm

Apache License 2.0

1.89k stars 243 forks source link

Codex for math scenarios #534

Closed teetone closed 2 years ago

teetone commented 2 years ago

Is it for math , numeracy , synthetic_reasoning and synthetic_reasoning_natural?

tonywu95 commented 2 years ago

In addition to these tasks, please also run for gsm8k, legal_support, babi, lsat, dyck, entity_matching , human_eval , APPS , entity_matching, entity_data_imputation. Thanks

rishibommasani commented 2 years ago

Closed since we are doing for all reasoning based on https://github.com/stanford-crfm/benchmarking/commit/ced441bb902aa2c615533aba3ccdac8ddb2c61c7