stanford-crfm / helm

Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of language models (https://arxiv.org/abs/2211.09110). This framework is also used to evaluate text-to-image models in Holistic Evaluation of Text-to-Image Models (HEIM) (https://arxiv.org/abs/2311.04287).
https://crfm.stanford.edu/helm
Apache License 2.0
1.89k stars 243 forks source link

using labelled data from LLMs #1922

Closed guillemram97 closed 9 months ago

guillemram97 commented 11 months ago

I want to store the answers from multiple models on multiple tasks to do Knowledge Distillation, but I'm broke and I can't afford to run them. I was thinking of using data from this project. Can we access full json files via an API/dataset? E.g., if we wanted to get the responses of openai/text-d avinci-003 on babi_qa, is there a way to do it other than looking for the run on the project site and then downloading the full json?

yifanmai commented 11 months ago

Right now, the raw runs JSON is the best way to get this data. You can access it through "Full JSON" (the scenario_state.json file). The schema for this file is defined in this Python dataclass; you could write a Python script to drill down to the specific data you need.