stanford-crfm / helm

Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of language models (https://arxiv.org/abs/2211.09110). This framework is also used to evaluate text-to-image models in Holistic Evaluation of Text-to-Image Models (HEIM) (https://arxiv.org/abs/2311.04287).
https://crfm.stanford.edu/helm
Apache License 2.0
1.86k stars 243 forks source link

Same metrics counted twice in HEIM? #2060

Open zhimin-z opened 9 months ago

zhimin-z commented 9 months ago

image image There is a discrepancy between the number of evaluation metrics stated and the actual metrics used in the evaluation.

zhimin-z commented 8 months ago

Is that a similar question? https://github.com/stanford-crfm/helm/issues/2034