logikon-ai / cot-eval

A framework for evaluating the effectiveness of chain-of-thought reasoning in language models.
https://huggingface.co/spaces/logikon/open_cot_leaderboard
MIT License
5 stars 1 forks source link

Evaluate: google/gemma-7b #23

Closed ggbetz closed 2 months ago

ggbetz commented 3 months ago

Check:

Parameters:

NEXT_MODEL_PATH=google/gemma-7b
NEXT_MODEL_REVISION=main
NEXT_MODEL_PRECISION=bfloat16
MAX_LENGTH=2048 
GPU_MEMORY_UTILIZATION=0.8
VLLM_SWAP_SPACE=6
yakazimir commented 3 months ago

It seems like the Gemma models might be gated now:


2024-04-09T04:20:55.274781365Z Make sure to have access to it at https://huggingface.co/google/gemma-7b-it.
2024-04-09T04:20:55.274783755Z 403 Client Error. (Request ID: Root=1-6614c227-3208ca3a48e755de34c72f05;44455daf-c987-4a57-a270-b9c55791144c)
2024-04-09T04:20:55.274786125Z 
2024-04-09T04:20:55.274788335Z Cannot access gated repo for url https://huggingface.co/google/gemma-7b-it/resolve/main/config.json.
2024-04-09T04:20:55.274790695Z Access to model google/gemma-7b-it is restricted and you are not in the authorized list. Visit https://huggingface.co/google/gemma-7b-it to ask for access.```
yakazimir commented 3 months ago

Fixed for myself, model running now.

yakazimir commented 2 months ago

Still really struggling with this and the SWAP issue after several runs (around 4, I think):


Processed prompts:  28%|██▊       | 64/230 [08:52<2:45:10, 59.70s/it]
Map:   0%|          | 0/230 [10:45<?, ? examples/s]
Map:   0%|          | 0/230 [10:45<?, ? examples/s]
2024-05-08T00:44:23.048300508Z Traceback (most recent call last):
2024-05-08T00:44:23.048326935Z   File "/usr/local/bin/cot-eval", line 8, in <module>
2024-05-08T00:44:23.048335541Z     sys.exit(main())
2024-05-08T00:44:23.048341438Z   File "/workspace/cot-eval/src/cot_eval/__main__.py", line 171, in main
2024-05-08T00:44:23.048355121Z     cot_data[task] = run_chain_on_task(task_data[task], chain)
2024-05-08T00:44:23.048361233Z   File "/workspace/cot-eval/src/cot_eval/__main__.py", line 105, in run_chain_on_task
2024-05-08T00:44:23.048369046Z     task_ds = task_ds.map(add_reasoning, batched=True, batch_size=2048, load_from_cache_file=False)
2024-05-08T00:44:23.048374980Z   File "/usr/local/lib/python3.10/dist-packages/datasets/arrow_dataset.py", line 593, in wrapper
2024-05-08T00:44:23.048463044Z     out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
2024-05-08T00:44:23.048488628Z   File "/usr/local/lib/python3.10/dist-packages/datasets/arrow_dataset.py", line 558, in wrapper
2024-05-08T00:44:23.048535258Z     out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
2024-05-08T00:44:23.048545490Z   File "/usr/local/lib/python3.10/dist-packages/datasets/arrow_dataset.py", line 3105, in map
2024-05-08T00:44:23.048870052Z     for rank, done, content in Dataset._map_single(**dataset_kwargs):
2024-05-08T00:44:23.048879958Z   File "/usr/local/lib/python3.10/dist-packages/datasets/arrow_dataset.py", line 3482, in _map_single
2024-05-08T00:44:23.049259319Z     batch = apply_function_on_filtered_inputs(
2024-05-08T00:44:23.049272024Z   File "/usr/local/lib/python3.10/dist-packages/datasets/arrow_dataset.py", line 3361, in apply_function_on_filtered_inputs
2024-05-08T00:44:23.049617346Z     processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
2024-05-08T00:44:23.049627849Z   File "/workspace/cot-eval/src/cot_eval/__main__.py", line 102, in add_reasoning
2024-05-08T00:44:23.049642611Z     reasoning_traces = chain.batch(input_batch)
2024-05-08T00:44:23.049649204Z   File "/usr/local/lib/python3.10/dist-packages/langchain_core/runnables/base.py", line 2643, in batch
2024-05-08T00:44:23.049952530Z     inputs = step.batch(
2024-05-08T00:44:23.049977314Z   File "/usr/local/lib/python3.10/dist-packages/langchain_core/runnables/base.py", line 4550, in batch
2024-05-08T00:44:23.050431353Z     return self.bound.batch(
2024-05-08T00:44:23.050443778Z   File "/usr/local/lib/python3.10/dist-packages/langchain_core/language_models/llms.py", line 340, in batch
2024-05-08T00:44:23.050479887Z     raise e
2024-05-08T00:44:23.050493220Z   File "/usr/local/lib/python3.10/dist-packages/langchain_core/language_models/llms.py", line 327, in batch
2024-05-08T00:44:23.050530315Z     llm_result = self.generate_prompt(
2024-05-08T00:44:23.050539214Z   File "/usr/local/lib/python3.10/dist-packages/langchain_core/language_models/llms.py", line 633, in generate_prompt
2024-05-08T00:44:23.050625956Z     return self.generate(prompt_strings, stop=stop, callbacks=callbacks, **kwargs)
2024-05-08T00:44:23.050639213Z   File "/usr/local/lib/python3.10/dist-packages/langchain_core/language_models/llms.py", line 803, in generate
2024-05-08T00:44:23.050701109Z     output = self._generate_helper(
2024-05-08T00:44:23.050713753Z   File "/usr/local/lib/python3.10/dist-packages/langchain_core/language_models/llms.py", line 670, in _generate_helper
2024-05-08T00:44:23.050798704Z     raise e
2024-05-08T00:44:23.050807620Z   File "/usr/local/lib/python3.10/dist-packages/langchain_core/language_models/llms.py", line 657, in _generate_helper
2024-05-08T00:44:23.050883262Z     self._generate(
2024-05-08T00:44:23.050892032Z   File "/usr/local/lib/python3.10/dist-packages/langchain_community/llms/vllm.py", line 132, in _generate
2024-05-08T00:44:23.050932473Z     outputs = self.client.generate(prompts, sampling_params)
2024-05-08T00:44:23.050941522Z   File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/llm.py", line 190, in generate
2024-05-08T00:44:23.050975799Z     return self._run_engine(use_tqdm)
2024-05-08T00:44:23.050984448Z   File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/llm.py", line 218, in _run_engine
2024-05-08T00:44:23.051012892Z     step_outputs = self.llm_engine.step()
2024-05-08T00:44:23.051021715Z   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 673, in step
2024-05-08T00:44:23.051095959Z     seq_group_metadata_list, scheduler_outputs = self.scheduler.schedule()
2024-05-08T00:44:23.051104821Z   File "/usr/local/lib/python3.10/dist-packages/vllm/core/scheduler.py", line 442, in schedule
2024-05-08T00:44:23.051151680Z     scheduler_outputs = self._schedule()
2024-05-08T00:44:23.051160389Z   File "/usr/local/lib/python3.10/dist-packages/vllm/core/scheduler.py", line 334, in _schedule
2024-05-08T00:44:23.051198090Z     self._preempt(victim_seq_group, blocks_to_swap_out)
2024-05-08T00:44:23.051207149Z   File "/usr/local/lib/python3.10/dist-packages/vllm/core/scheduler.py", line 562, in _preempt
2024-05-08T00:44:23.051265686Z     self._preempt_by_swap(seq_group, blocks_to_swap_out)
2024-05-08T00:44:23.051275036Z   File "/usr/local/lib/python3.10/dist-packages/vllm/core/scheduler.py", line 585, in _preempt_by_swap
2024-05-08T00:44:23.051332195Z     self._swap_out(seq_group, blocks_to_swap_out)
2024-05-08T00:44:23.051341166Z   File "/usr/local/lib/python3.10/dist-packages/vllm/core/scheduler.py", line 606, in _swap_out
2024-05-08T00:44:23.051398579Z     raise RuntimeError(
2024-05-08T00:44:23.051407095Z RuntimeError: Aborted due to the lack of CPU swap space. Please increase the swap space to avoid this error.
2024-05-08T00:44:25.835784537Z 
Processed prompts:  28%|██▊       | 64/230 [10:47<27:59, 10.12s/it]  ```
ggbetz commented 2 months ago

I see, so have you tried:

GPU_MEMORY_UTILIZATION=0.5
VLLM_SWAP_SPACE=16
yakazimir commented 2 months ago

Trying now, also on a different computer.

ggbetz commented 2 months ago

Completed.