Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of language models (https://arxiv.org/abs/2211.09110). This framework is also used to evaluate text-to-image models in Holistic Evaluation of Text-to-Image Models (HEIM) (https://arxiv.org/abs/2311.04287).
Hi, in this PR, I added RekaClient and the Vibe-Eval. I did a canary run (50 instances) on Vibe-Eval scenario using Qwen-VL-Chat. Here's the result:
run_spec.json scenario.json per_instance_stats.json
And for running the full Vibe-Eval scenario, the
conf
is like:The
credentials.conf
file is like:Please let me know how I can improve it, thanks!