Evaluate: Gemma 2 - Githubissues

ggbetz commented 3 months ago

Check upon issue creation:

[x] The model has not been evaluated yet and doesn't show up on the CoT Leaderboard.
[x] There is no evaluation request issue for the model in the repo.
[x] The parameters below have been adapted and shall be used.

Parameters with XXX in [9b, 27b]:

NEXT_MODEL_PATH=google/gemma-2-XXX-it
NEXT_MODEL_REVISION=main
NEXT_MODEL_PRECISION=bfloat16
MAX_LENGTH=2048 
GPU_MEMORY_UTILIZATION=0.7
VLLM_SWAP_SPACE=4

ToDos:

[ ] Run cot-eval pipeline
[ ] Merge pull requests for cot-eval results datats (> @ggbetz)
[ ] Create eval request record to update metadata on leaderboard (> @ggbetz)

ggbetz commented 2 months ago

I got

Please use Flashinfer backend for models withlogits_soft_cap (i.e., Gemma-2).
Otherwise, the output might be wrong. Set Flashinfer backend by export 
VLLM_ATTENTION_BACKEND=FLASHINFER. (type=value_error)

ggbetz commented 2 months ago

We might consider to re-run the evals for Gemma 1.

ggbetz commented 2 months ago

I've added flashinfer to our docker contaner, but still get an error when trying to run and evaluate gemma2:

INFO 08-01 10:22:08 selector.py:79] Using Flashinfer backend.
WARNING 08-01 10:22:08 selector.py:80] Flashinfer will be stuck on llama-2-7b, please avoid using Flashinfer as the backend when running on llama-2-7b.
INFO 08-01 10:22:08 weight_utils.py:218] Using model weights format ['*.safetensors']
INFO 08-01 10:23:11 model_runner.py:255] Loading model weights took 4.9975 GB
[rank0]: Traceback (most recent call last):
[rank0]:   File "/usr/local/bin/cot-eval", line 8, in <module>
[rank0]:     sys.exit(main())
[rank0]:   File "/workspace/cot-eval/src/cot_eval/__main__.py", line 149, in main
[rank0]:     llm = VLLM(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/pydantic/v1/main.py", line 339, in __init__
[rank0]:     values, fields_set, validation_error = validate_model(__pydantic_self__.__class__, data)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/pydantic/v1/main.py", line 1050, in validate_model
[rank0]:     input_data = validator(cls_, input_data)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/langchain_core/utils/pydantic.py", line 146, in wrapper
[rank0]:     return func(cls, values)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/langchain_community/llms/vllm.py", line 89, in validate_environment
[rank0]:     values["client"] = VLLModel(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/llm.py", line 149, in __init__
[rank0]:     self.llm_engine = LLMEngine.from_engine_args(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 414, in from_engine_args
[rank0]:     engine = cls(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 256, in __init__
[rank0]:     self._initialize_kv_caches()
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 353, in _initialize_kv_caches
[rank0]:     self.model_executor.determine_num_available_blocks())
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/executor/gpu_executor.py", line 76, in determine_num_available_blocks
[rank0]:     return self.driver_worker.determine_num_available_blocks()
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]:     return func(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 173, in determine_num_available_blocks
[rank0]:     self.model_runner.profile_run()
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]:     return func(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 874, in profile_run
[rank0]:     self.execute_model(model_input, kv_caches, intermediate_tensors)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]:     return func(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 1221, in execute_model
[rank0]:     model_input.attn_metadata.begin_forward()
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/attention/backends/flashinfer.py", line 132, in begin_forward
[rank0]:     self.prefill_wrapper.begin_forward(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/flashinfer/prefill.py", line 791, in begin_forward
[rank0]:     self._wrapper.begin_forward(
[rank0]: RuntimeError: CHECK_EQ(paged_kv_indptr.size(0), batch_size + 1) failed. 1 vs 257

ggbetz commented 2 months ago

We'll probably have to wait for the next vllm release. See:

https://github.com/flashinfer-ai/flashinfer/issues/362 https://github.com/vllm-project/vllm/pull/7008

logikon-ai / cot-eval

Evaluate: Gemma 2 #56