stanford-crfm / helm

Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of language models (https://arxiv.org/abs/2211.09110). This framework is also used to evaluate text-to-image models in HEIM (https://arxiv.org/abs/2311.04287) and vision-language models in VHELM (https://arxiv.org/abs/2410.07112).
https://crfm.stanford.edu/helm
Apache License 2.0
1.95k stars 252 forks source link

Adapter parameters for LSAT, BABI #434

Closed rishibommasani closed 2 years ago

rishibommasani commented 2 years ago

@dorarad Please specify the various adapter parameters; currently they are temperature = 1 and max_train instances = 2, which I don't think makes sense. You can CTRL+F for your name in run_specs.py; this is also noted in #268 and #312.

dorarad commented 2 years ago

For max_train_instances I used 2 since larger values didn't fit in the context window. It gave a warning about pruning the examples for any value larger than that so I reduced it to 2. For the temperature, yea I didn't explore different values. My starting point was a config of another task (in which the temp was 1 and I kept that). I think the common value used for other tasks should be used here too. LSAT is a pretty standard reading comp task.

rishibommasani commented 2 years ago

@dilarasoylu Is this resolved; if so, can you close?

dorarad commented 2 years ago

not resolved yet but I've seen Tony's slack message on the mercury channel about the next run and will make sure to do by tomorrow!

rishibommasani commented 2 years ago

Fantastic, thanks Dor!

dorarad commented 2 years ago

Alright made a pull request https://github.com/stanford-crfm/benchmarking/pull/494. Not 100% sure about the number of tokens in babi. It could help models if we tell it it can assume one-word answers but that indeed makes assumptions about tokenization so either we can set a larger but still small value or remove it.