Closed richardzhuang0412 closed 3 months ago
Could you provide the complete logs from your run (as a shared file, a file attachment or GitHub Gist)?
Here is the log:
/data/richard/helm (main) » helm-run \ tianhao@sn4622117596 --run-entries med_qa:model=NousResearch/Meta-Llama-3-8B \ --enable-huggingface-models NousResearch/Meta-Llama-3-8B \ --suite v1 \ --max-eval-instances 10
main { Reading tokenizer configs from /data/tianhao/miniconda3/envs/crfm-helm/lib/python3.8/site-packages/helm/config/tokenizer_configs.yaml... Reading model deployments from /data/tianhao/miniconda3/envs/crfm-helm/lib/python3.8/site-packages/helm/config/model_deployments.yaml... Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Registered default metadata for model NousResearch/Meta-Llama-3-8B 1 entries produced 1 run specs run_specs { RunSpec(name='med_qa:model=NousResearch_Meta-Llama-3-8B', scenario_spec=ScenarioSpec(class_name='helm.benchmark.scenarios.med_qa_scenario.MedQAScenario', args={}), adapter_spec=AdapterSpec(method='multiple_choice_joint', global_prefix='', global_suffix='', instructions='The following are multiple choice questions (with answers) about medicine.\n', input_prefix='Question: ', input_suffix='\n', reference_prefix='A. ', reference_suffix='\n', output_prefix='Answer: ', output_suffix='\n', instance_prefix='\n', substitutions=[], max_train_instances=5, max_eval_instances=10, num_outputs=5, num_train_trials=1, num_trials=1, sample_train=True, model_deployment='NousResearch/Meta-Llama-3-8B', model='NousResearch/Meta-Llama-3-8B', temperature=0.0, max_tokens=1, stop_sequences=['\n'], random=None, multi_label=False, image_generation_parameters=None, eval_splits=None), metric_specs=[MetricSpec(class_name='helm.benchmark.metrics.basic_metrics.BasicGenerationMetric', args={'names': ['exact_match', 'quasi_exact_match', 'prefix_exact_match', 'quasi_prefix_exact_match']}), MetricSpec(class_name='helm.benchmark.metrics.basic_metrics.BasicReferenceMetric', args={}), MetricSpec(class_name='helm.benchmark.metrics.basic_metrics.InstancesPerSplitMetric', args={})], data_augmenter_spec=DataAugmenterSpec(perturbation_specs=[], should_augment_train_instances=False, should_include_original_train=False, should_skip_unchanged_train=False, should_augment_eval_instances=False, should_include_original_eval=False, should_skip_unchanged_eval=False, seeds_per_instance=1), groups=['med_qa'], annotators=None) } [0.0s] Running in local mode with base path: prod_env Looking in path: prod_env AutoTokenizer: cache_backend_config = SqliteCacheBackendConfig(path='prod_env/cache') AutoClient: file_storage_path = prod_env/cache AutoClient: cache_backend_config = SqliteCacheBackendConfig(path='prod_env/cache') AutoTokenizer: cache_backend_config = SqliteCacheBackendConfig(path='prod_env/cache') Found 1 account(s). Looking in path: prod_env AnnotatorFactory: file_storage_path = prod_env/cache AnnotatorFactory: cache_backend_config = SqliteCacheBackendConfig(path='prod_env/cache') 0%| | 0/1 [00:00<?, ?it/s] Running med_qa:model=NousResearch_Meta-Llama-3-8B { scenario.get_instances { ensure_file_downloaded { } [0.0s] } [0.0s] } [0.002s] Error when running med_qa:model=NousResearch_Meta-Llama-3-8B: Traceback (most recent call last): File "/data/tianhao/miniconda3/envs/crfm-helm/lib/python3.8/site-packages/helm/common/general.py", line 89, in ensure_file_downloaded import gdown # noqa ModuleNotFoundError: No module named 'gdown'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/data/tianhao/miniconda3/envs/crfm-helm/lib/python3.8/site-packages/helm/benchmark/runner.py", line 216, in run_all
self.run_one(run_spec)
File "/data/tianhao/miniconda3/envs/crfm-helm/lib/python3.8/site-packages/helm/benchmark/runner.py", line 255, in run_one
instances = scenario.get_instances(scenario_output_path)
File "/data/tianhao/miniconda3/envs/crfm-helm/lib/python3.8/site-packages/helm/benchmark/scenarios/med_qa_scenario.py", line 63, in get_instances
ensure_file_downloaded(
File "/data/tianhao/miniconda3/envs/crfm-helm/lib/python3.8/site-packages/helm/common/hierarchical_logger.py", line 104, in wrapper
return fn(*args, **kwargs)
File "/data/tianhao/miniconda3/envs/crfm-helm/lib/python3.8/site-packages/helm/common/general.py", line 91, in ensure_file_downloaded
handle_module_not_found_error(e, ["scenarios"])
File "/data/tianhao/miniconda3/envs/crfm-helm/lib/python3.8/site-packages/helm/common/optional_dependencies.py", line 14, in handle_module_not_found_error
raise OptionalDependencyNotInstalled(
helm.common.optional_dependencies.OptionalDependencyNotInstalled: Optional dependency gdown is not installed. Please run pip install crfm-helm[scenarios]
or pip install crfm-helm[all]
to install it.
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 62.08it/s]
} [7.004s]
Traceback (most recent call last):
File "/data/tianhao/miniconda3/envs/crfm-helm/bin/helm-run", line 8, in
It seems that it's the dependency problem but I was not able to run any of the command like "pip install crfm-helm[scenarios]
or pip install crfm-helm[all]
"
I tried pip install gdown and it seems to work so I guess the problem is solved. But could you let me know how does pip install crfm-helm[scenarios] or pip install crfm-helm[all] works?
Could you provide the logs from running pip install crfm-helm[scenarios]
within on your shell with your conda environment activated? I would expect that the command should "just work".
For zsh
, could you try instead running:
pip install 'crfm-helm[scenarios]'
(with the single quotes)
Oh yes that is working. Thank you so much!
Hi Yifan,
Do you know what I should do if I want to increase evaluation speed by running parallel inference on multiple GPUs?
For example, I am using the command "helm-run \ --run-entries boolq:model=NousResearch/Meta-Llama-3-8B \ --enable-huggingface-models NousResearch/Meta-Llama-3-8B \ --suite v1 \ --max-eval-instances 10" right now.
And I am unable to run for example llama-3-70b for now. Even if I specify CUDA_AVAILABLE_DEVICES=0,1,2,3,4,5,6,7 it still gives CUDA 0 OOM error.
In general, we don't support parallel inference in HELM. Sorry about that.
I was testing evaluation using this code:
However this error occurs: "helm.benchmark.runner.RunnerError: Failed runs: ["med_qa:model=NousResearch_Meta-Llama-3-8B"]". I was able to run evaluation for GSM8k using the same command with "med_qa" replaced to be "gsm". Did I do something wrong?