stanford-crfm / helm

Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of language models (https://arxiv.org/abs/2211.09110). This framework is also used to evaluate text-to-image models in HEIM (https://arxiv.org/abs/2311.04287) and vision-language models in VHELM (https://arxiv.org/abs/2410.07112).
https://crfm.stanford.edu/helm
Apache License 2.0
1.95k stars 253 forks source link

Issue with running HEIM #3080

Open sudhir-mcw opened 1 month ago

sudhir-mcw commented 1 month ago

HI @teetone I am trying to try out heim with the following command and was facing the issue from heim documentation

helm-run --run-entries mscoco:model=huggingface/stable-diffusion-v1-4 --suite my-heim-suite --max-eval-instances 1

HuggingFaceDiffusersClient error: Failed to import diffusers.pipelines.stable_diffusion because of the following error (look up to see its traceback): 'Config' object has no attribute 'define_bool_state' Request failed. Retrying (attempt #2) in 10 seconds... (See above for error details)

File "helm/src/helm/benchmark/window_services/window_service_factory.py", line 17, in get_window_service model_deployment: Optional[ModelDeployment] = get_model_deployment(model_deployment_name) File "helm/src/helm/benchmark/model_deployment_registry.py", line 132, in get_model_deployment raise ValueError(f"Model deployment {name} not found") ValueError: Model deployment openai/clip-vit-large-patch14 not found

0%| | 0/1 [00:35<?, ?it/s] } [37.279s] Traceback (most recent call last): File "helm/src/helm/benchmark/run.py", line 380, in main() File "helm/src/helm/common/hierarchical_logger.py", line 104, in wrapper return fn(*args, **kwargs) File "helm/src/helm/benchmark/run.py", line 351, in main run_benchmarking( File "helm/src/helm/benchmark/run.py", line 128, in run_benchmarking runner.run_all(run_specs) File "helm/src/helm/benchmark/runner.py", line 226, in run_all raise RunnerError(f"Failed runs: [{failed_runs_str}]") helm.benchmark.runner.RunnerError: Failed runs: ["mscoco:model=huggingface_stable-diffusion-v1-4"]

Here is information on my setup conda env Python 3.9.20 I installed heim using the build from source instead of using pip, as pip version was taking quite a long time to resolve the dependencies Here are the steps i used to install

cd helm
pip install -r requirements.txt
pip install -e .[all]

I checked the community forum and tried replacing jax version to latest as well, but still no luck

jax==0.4.30
jaxlib==0.4.30

Are there any other installation and quick start documentation related to heim apart from heim.md in the docs ?

yifanmai commented 1 month ago

The likely cause is that you have not run install-heim-extras.sh as explained in the HEIM docs; could you try that and see if that fixes things?

Sorry that this was not clearly explained in the documentation. I've updated the documentation to make things more clear.

sudhir-mcw commented 1 month ago

Hi @yifanmai, Thanks for the reply. I tried once again after installing the install-heim-extras.sh, The process gets interrupted with the following error

AestheticsMetric() {
    Parallelizing computation on 1 items over 4 threads {

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:08<00:00, 8.58s/it] } [8.579s]██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:08<00:00, 8.58s/it] } [8.58s] CLIPScoreMetric(multilingual=False) { Parallelizing computation on 1 items over 4 threads { 0%| | 0/1 [00:00<?, ?it/s] } [0.002s] | 0/1 [00:00<?, ?it/s] } [0.002s] } [14.125s] } [6m14.466s] Error when running mscoco:model=huggingface_stable-diffusion-v1-4: Traceback (most recent call last): File "/mnt/gpu-perf-test-storage/sudhir/helm/src/helm/benchmark/runner.py", line 216, in run_all self.run_one(run_spec) File "/mnt/gpu-perf-test-storage/sudhir/helm/src/helm/benchmark/runner.py", line 307, in run_one metric_result: MetricResult = metric.evaluate( File "/mnt/gpu-perf-test-storage/sudhir/helm/src/helm/benchmark/metrics/metric.py", line 143, in evaluate results: List[List[Stat]] = parallel_map( File "/mnt/gpu-perf-test-storage/sudhir/helm/src/helm/common/general.py", line 235, in parallel_map results = list(tqdm(executor.map(process, items), total=len(items), disable=None)) File "/mnt/gpu-perf-test-storage/sudhir/miniconda3/envs/crfm-helm/lib/python3.9/site-packages/tqdm/std.py", line 1181, in iter for obj in iterable: File "/mnt/gpu-perf-test-storage/sudhir/miniconda3/envs/crfm-helm/lib/python3.9/concurrent/futures/_base.py", line 609, in result_iterator yield fs.pop().result() File "/mnt/gpu-perf-test-storage/sudhir/miniconda3/envs/crfm-helm/lib/python3.9/concurrent/futures/_base.py", line 439, in result return self.get_result() File "/mnt/gpu-perf-test-storage/sudhir/miniconda3/envs/crfm-helm/lib/python3.9/concurrent/futures/_base.py", line 391, in get_result raise self._exception File "/mnt/gpu-perf-test-storage/sudhir/miniconda3/envs/crfm-helm/lib/python3.9/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, self.kwargs) File "/mnt/gpu-perf-test-storage/sudhir/helm/src/helm/benchmark/metrics/metric.py", line 77, in process self.metric.evaluate_generation( File "/mnt/gpu-perf-test-storage/sudhir/helm/src/helm/benchmark/metrics/image_generation/clip_score_metrics.py", line 58, in evaluate_generation prompt = WindowServiceFactory.get_window_service(model, metric_service).truncate_from_right(prompt) File "/mnt/gpu-perf-test-storage/sudhir/helm/src/helm/benchmark/window_services/window_service_factory.py", line 17, in get_window_service model_deployment: Optional[ModelDeployment] = get_model_deployment(model_deployment_name) File "/mnt/gpu-perf-test-storage/sudhir/helm/src/helm/benchmark/model_deployment_registry.py", line 130, in get_model_deployment raise ValueError(f"Model deployment {name} not found") ValueError: Model deployment openai/clip-vit-large-patch14** not found

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [06:14<00:00, 374.49s/it] } [6m21.356s] Traceback (most recent call last): File "/mnt/gpu-perf-test-storage/sudhir/miniconda3/envs/crfm-helm/bin/helm-run", line 8, in sys.exit(main()) File "/mnt/gpu-perf-test-storage/sudhir/helm/src/helm/common/hierarchical_logger.py", line 104, in wrapper return fn(*args, **kwargs) File "/mnt/gpu-perf-test-storage/sudhir/helm/src/helm/benchmark/run.py", line 350, in main run_benchmarking( File "/mnt/gpu-perf-test-storage/sudhir/helm/src/helm/benchmark/run.py", line 127, in run_benchmarking runner.run_all(run_specs) File "/mnt/gpu-perf-test-storage/sudhir/helm/src/helm/benchmark/runner.py", line 225, in run_all raise RunnerError(f"Failed runs: [{failed_runs_str}]") helm.benchmark.runner.RunnerError: Failed runs: ["mscoco:model=huggingface_stable-diffusion-v1-4"]

It's runnning fine till aesthetic metrics, it gets stopped at clip score calculation, Are there any configuration I am missing on?

yifanmai commented 4 days ago

I'm able to reproduce this myself. @teetone would you know what's happening here?