logikon-ai / cot-eval

A framework for evaluating the effectiveness of chain-of-thought reasoning in language models.
https://huggingface.co/spaces/logikon/open_cot_leaderboard
MIT License
5 stars 1 forks source link

Evaluate: CohereForAI/c4ai-command-r-plus #44

Open ggbetz opened 3 months ago

ggbetz commented 3 months ago

Check upon issue creation:

Parameters:

NEXT_MODEL_PATH=CohereForAI/c4ai-command-r-plus
NEXT_MODEL_REVISION=main
NEXT_MODEL_PRECISION=float16
MAX_LENGTH=2048 
GPU_MEMORY_UTILIZATION=0.8
VLLM_SWAP_SPACE=16

ToDos:

yakazimir commented 2 months ago

Is this 104 billion parameters?

ggbetz commented 2 months ago

I fear so.

Maybe postpone until we have together.ai support, right?

yakazimir commented 2 months ago

yeh, I don't think it will be so easy to run.

yakazimir commented 1 month ago

I tried to run it on some h100s, should be fine, but probably there is a VLLM issue here:

2024-06-09T01:17:34.650526432Z Traceback (most recent call last):
2024-06-09T01:17:34.650550937Z   File "/usr/local/bin/cot-eval", line 8, in <module>
2024-06-09T01:17:34.650625978Z     sys.exit(main())
2024-06-09T01:17:34.650629280Z   File "/workspace/cot-eval/src/cot_eval/__main__.py", line 149, in main
2024-06-09T01:17:34.650684105Z     llm = VLLM(
2024-06-09T01:17:34.650688295Z   File "/usr/local/lib/python3.10/dist-packages/langchain_core/load/serializable.py", line 120, in __init__
2024-06-09T01:17:34.650708283Z     super().__init__(**kwargs)
2024-06-09T01:17:34.650710047Z   File "/usr/local/lib/python3.10/dist-packages/pydantic/v1/main.py", line 339, in __init__
2024-06-09T01:17:34.650779281Z     values, fields_set, validation_error = validate_model(__pydantic_self__.__class__, data)
2024-06-09T01:17:34.650781373Z   File "/usr/local/lib/python3.10/dist-packages/pydantic/v1/main.py", line 1102, in validate_model
2024-06-09T01:17:34.650913230Z     values = validator(cls_, values)
2024-06-09T01:17:34.650915258Z   File "/usr/local/lib/python3.10/dist-packages/langchain_community/llms/vllm.py", line 88, in validate_environment
2024-06-09T01:17:34.650935652Z     values["client"] = VLLModel(
2024-06-09T01:17:34.650937500Z   File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/llm.py", line 112, in __init__
2024-06-09T01:17:34.650972176Z     self.llm_engine = LLMEngine.from_engine_args(
2024-06-09T01:17:34.650974087Z   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 196, in from_engine_args
2024-06-09T01:17:34.651018637Z     engine = cls(
2024-06-09T01:17:34.651020560Z   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 110, in __init__
2024-06-09T01:17:34.651040423Z     self.model_executor = executor_class(model_config, cache_config,
2024-06-09T01:17:34.651042706Z   File "/usr/local/lib/python3.10/dist-packages/vllm/executor/ray_gpu_executor.py", line 62, in __init__
2024-06-09T01:17:34.651070210Z     self._init_workers_ray(placement_group)
2024-06-09T01:17:34.651072488Z   File "/usr/local/lib/python3.10/dist-packages/vllm/executor/ray_gpu_executor.py", line 192, in _init_workers_ray
2024-06-09T01:17:34.651097110Z     self._run_workers(
2024-06-09T01:17:34.651098955Z   File "/usr/local/lib/python3.10/dist-packages/vllm/executor/ray_gpu_executor.py", line 324, in _run_workers
2024-06-09T01:17:34.651152655Z     driver_worker_output = getattr(self.driver_worker,
2024-06-09T01:17:34.651154577Z   File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 107, in load_model
2024-06-09T01:17:34.651173302Z     self.model_runner.load_model()
2024-06-09T01:17:34.651174918Z   File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 95, in load_model
2024-06-09T01:17:34.651205321Z     self.model = get_model(
2024-06-09T01:17:34.651206893Z   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader.py", line 101, in get_model
2024-06-09T01:17:34.651226346Z     model.load_weights(model_config.model, model_config.download_dir,
2024-06-09T01:17:34.651228052Z   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/commandr.py", line 325, in load_weights
2024-06-09T01:17:34.651286187Z     param = params_dict[name]
2024-06-09T01:17:34.651290524Z KeyError: 'model.layers.19.self_attn.k_norm.weight'
2024-06-09T01:17:36.936923543Z (RayWorkerVllm pid=11138) ERROR 06-09 01:17:35 ray_utils.py:44] Error executing method load_model. This might cause deadlock in distributed execution. [repeated 3x across cluster]
2024-06-09T01:17:36.936946060Z (RayWorkerVllm pid=11138) ERROR 06-09 01:17:35 ray_utils.py:44] Traceback (most recent call last): [repeated 3x across cluster]
2024-06-09T01:17:36.936948507Z (RayWorkerVllm pid=11138) ERROR 06-09 01:17:35 ray_utils.py:44]   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/ray_utils.py", line 37, in execute_method [repeated 3x across cluster]
2024-06-09T01:17:36.936959586Z (RayWorkerVllm pid=11138) ERROR 06-09 01:17:35 ray_utils.py:44]     return executor(*args, **kwargs) [repeated 3x across cluster]
2024-06-09T01:17:36.936961715Z (RayWorkerVllm pid=11138) ERROR 06-09 01:17:35 ray_utils.py:44]   File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 95, in load_model [repeated 6x across cluster]
2024-06-09T01:17:36.936963598Z (RayWorkerVllm pid=11138) ERROR 06-09 01:17:35 ray_utils.py:44]     self.model_runner.load_model() [repeated 3x across cluster]
2024-06-09T01:17:36.936965279Z (RayWorkerVllm pid=11138) ERROR 06-09 01:17:35 ray_utils.py:44]     self.model = get_model( [repeated 3x across cluster]
2024-06-09T01:17:36.936966750Z (RayWorkerVllm pid=11138) ERROR 06-09 01:17:35 ray_utils.py:44]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader.py", line 101, in get_model [repeated 3x across cluster]
2024-06-09T01:17:36.936968802Z (RayWorkerVllm pid=11138) ERROR 06-09 01:17:35 ray_utils.py:44]     model.load_weights(model_config.model, model_config.download_dir, [repeated 3x across cluster]
2024-06-09T01:17:36.936970607Z (RayWorkerVllm pid=11138) ERROR 06-09 01:17:35 ray_utils.py:44]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/commandr.py", line 325, in load_weights [repeated 3x across cluster]
2024-06-09T01:17:36.936972282Z (RayWorkerVllm pid=11138) ERROR 06-09 01:17:35 ray_utils.py:44]     param = params_dict[name] [repeated 3x across cluster]
2024-06-09T01:17:36.936974756Z (RayWorkerVllm pid=11138) ERROR 06-09 01:17:35 ray_utils.py:44] KeyError: 'model.layers.19.self_attn.k_norm.weight' [repeated 3x across cluster]
yakazimir commented 1 month ago

There is a non-quantized version we could try, see note here: https://huggingface.co/CohereForAI/c4ai-command-r-plus