stanford-crfm / helm

Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of language models (https://arxiv.org/abs/2211.09110). This framework is also used to evaluate text-to-image models in HEIM (https://arxiv.org/abs/2311.04287) and vision-language models in VHELM (https://arxiv.org/abs/2410.07112).
https://crfm.stanford.edu/helm
Apache License 2.0
1.96k stars 254 forks source link

VHELM setup #3183

Open smsarov opened 5 days ago

smsarov commented 5 days ago

I've followed all the steps for VHELM setup and got the error:

(crfm-helm) smsarov@MacBook-Air-Alexander diploma % helm-run --run-entries mmmu:subject=Accounting,model=openai/gpt-4o-mini-2024-07-18 --suite my-vhelm-suite --max-eval-instances 10

main { Reading tokenizer configs from /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/helm/config/tokenizer_configs.yaml... Reading model deployments from /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/helm/config/model_deployments.yaml... } [0.717s] Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.10/bin/helm-run", line 8, in sys.exit(main()) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/helm/common/hierarchical_logger.py", line 104, in wrapper return fn(*args, kwargs) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/helm/benchmark/run.py", line 332, in main run_specs = run_entries_to_run_specs( File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/helm/benchmark/run.py", line 43, in run_entries_to_run_specs for run_spec in construct_run_specs(parse_object_spec(entry.description)): File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/helm/benchmark/run_spec_factory.py", line 64, in construct_run_specs run_specs: List[RunSpec] = [run_spec_function(args)] TypeError: get_mmmu_spec() missing 1 required positional argument: 'question_type'

I tried running quick start for usual helm and it worked as expected

ImKeTT commented 4 days ago

Thanks for your interest in using the framework! There are two question types in MMMU: open and multiple-choice, you have to pass question_type=multiple-choice when testing on the task. You can refer to the run_entries_vhelm.conf for other args in vhelm tasks. https://github.com/stanford-crfm/helm/blob/ee10b8fd5a0f46949c98cd7a0a79cb7e1b163073/src/helm/benchmark/presentation/run_entries_vhelm.conf#L82