Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of language models (https://arxiv.org/abs/2211.09110). This framework is also used to evaluate text-to-image models in Holistic Evaluation of Text-to-Image Models (HEIM) (https://arxiv.org/abs/2311.04287).
gsutil rsync is deprecated and users should use gcloud storage rsync instead.
Valentin also reports:
Quick comment on the downloading instructions: gsutil -m rsync -r $GCS_BENCHMARK_OUTPUT_PATH $LOCAL_BENCHMARK_OUTPUT_PATH doesn't work for Windows machines since the folder names contain colons (e.g., commonsense:dataset=openbookqa,method=multiple_choice_joint,model=google_gemini-1.0-pro-001), which is not allowed on Windows. A simple work-around is to use gcloud storage rsync -r $GCS_BENCHMARK_OUTPUT_PATH $LOCAL_BENCHMARK_OUTPUT_PATH instead, which automatically renames folders on Windows.
gsutil rsync
is deprecated and users should usegcloud storage rsync
instead.Valentin also reports: