Evaluate: CohereForAI/c4ai-command-r-plus

ggbetz commented 3 months ago

Check upon issue creation:

[x] The model has not been evaluated yet and doesn't show up on the CoT Leaderboard.
[x] There is no evaluation request issue for the model in the repo.
[x] The parameters below have been adapted and shall be used.

Parameters:

NEXT_MODEL_PATH=CohereForAI/c4ai-command-r-plus
NEXT_MODEL_REVISION=main
NEXT_MODEL_PRECISION=float16
MAX_LENGTH=2048 
GPU_MEMORY_UTILIZATION=0.8
VLLM_SWAP_SPACE=16

ToDos:

[ ] Run cot-eval pipeline
[ ] Merge pull requests for cot-eval results datats (> @ggbetz)
[ ] Create eval request record to update metadata on leaderboard (> @ggbetz)

yakazimir commented 2 months ago

Is this 104 billion parameters?

ggbetz commented 2 months ago

I fear so.

Maybe postpone until we have together.ai support, right?

yakazimir commented 2 months ago

yeh, I don't think it will be so easy to run.

yakazimir commented 1 month ago

I tried to run it on some h100s, should be fine, but probably there is a VLLM issue here:

2024-06-09T01:17:34.650526432Z Traceback (most recent call last):
2024-06-09T01:17:34.650550937Z   File "/usr/local/bin/cot-eval", line 8, in <module>
2024-06-09T01:17:34.650625978Z     sys.exit(main())
2024-06-09T01:17:34.650629280Z   File "/workspace/cot-eval/src/cot_eval/__main__.py", line 149, in main
2024-06-09T01:17:34.650684105Z     llm = VLLM(
2024-06-09T01:17:34.650688295Z   File "/usr/local/lib/python3.10/dist-packages/langchain_core/load/serializable.py", line 120, in __init__
2024-06-09T01:17:34.650708283Z     super().__init__(**kwargs)
2024-06-09T01:17:34.650710047Z   File "/usr/local/lib/python3.10/dist-packages/pydantic/v1/main.py", line 339, in __init__
2024-06-09T01:17:34.650779281Z     values, fields_set, validation_error = validate_model(__pydantic_self__.__class__, data)
2024-06-09T01:17:34.650781373Z   File "/usr/local/lib/python3.10/dist-packages/pydantic/v1/main.py", line 1102, in validate_model
2024-06-09T01:17:34.650913230Z     values = validator(cls_, values)
2024-06-09T01:17:34.650915258Z   File "/usr/local/lib/python3.10/dist-packages/langchain_community/llms/vllm.py", line 88, in validate_environment
2024-06-09T01:17:34.650935652Z     values["client"] = VLLModel(
2024-06-09T01:17:34.650937500Z   File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/llm.py", line 112, in __init__
2024-06-09T01:17:34.650972176Z     self.llm_engine = LLMEngine.from_engine_args(
2024-06-09T01:17:34.650974087Z   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 196, in from_engine_args
2024-06-09T01:17:34.651018637Z     engine = cls(
2024-06-09T01:17:34.651020560Z   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 110, in __init__
2024-06-09T01:17:34.651040423Z     self.model_executor = executor_class(model_config, cache_config,
2024-06-09T01:17:34.651042706Z   File "/usr/local/lib/python3.10/dist-packages/vllm/executor/ray_gpu_executor.py", line 62, in __init__
2024-06-09T01:17:34.651070210Z     self._init_workers_ray(placement_group)
2024-06-09T01:17:34.651072488Z   File "/usr/local/lib/python3.10/dist-packages/vllm/executor/ray_gpu_executor.py", line 192, in _init_workers_ray
2024-06-09T01:17:34.651097110Z     self._run_workers(
2024-06-09T01:17:34.651098955Z   File "/usr/local/lib/python3.10/dist-packages/vllm/executor/ray_gpu_executor.py", line 324, in _run_workers
2024-06-09T01:17:34.651152655Z     driver_worker_output = getattr(self.driver_worker,
2024-06-09T01:17:34.651154577Z   File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 107, in load_model
2024-06-09T01:17:34.651173302Z     self.model_runner.load_model()
2024-06-09T01:17:34.651174918Z   File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 95, in load_model
2024-06-09T01:17:34.651205321Z     self.model = get_model(
2024-06-09T01:17:34.651206893Z   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader.py", line 101, in get_model
2024-06-09T01:17:34.651226346Z     model.load_weights(model_config.model, model_config.download_dir,
2024-06-09T01:17:34.651228052Z   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/commandr.py", line 325, in load_weights
2024-06-09T01:17:34.651286187Z     param = params_dict[name]
2024-06-09T01:17:34.651290524Z KeyError: 'model.layers.19.self_attn.k_norm.weight'
2024-06-09T01:17:36.936923543Z [36m(RayWorkerVllm pid=11138)[0m ERROR 06-09 01:17:35 ray_utils.py:44] Error executing method load_model. This might cause deadlock in distributed execution.[32m [repeated 3x across cluster][0m
2024-06-09T01:17:36.936946060Z [36m(RayWorkerVllm pid=11138)[0m ERROR 06-09 01:17:35 ray_utils.py:44] Traceback (most recent call last):[32m [repeated 3x across cluster][0m
2024-06-09T01:17:36.936948507Z [36m(RayWorkerVllm pid=11138)[0m ERROR 06-09 01:17:35 ray_utils.py:44]   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/ray_utils.py", line 37, in execute_method[32m [repeated 3x across cluster][0m
2024-06-09T01:17:36.936959586Z [36m(RayWorkerVllm pid=11138)[0m ERROR 06-09 01:17:35 ray_utils.py:44]     return executor(*args, **kwargs)[32m [repeated 3x across cluster][0m
2024-06-09T01:17:36.936961715Z [36m(RayWorkerVllm pid=11138)[0m ERROR 06-09 01:17:35 ray_utils.py:44]   File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 95, in load_model[32m [repeated 6x across cluster][0m
2024-06-09T01:17:36.936963598Z [36m(RayWorkerVllm pid=11138)[0m ERROR 06-09 01:17:35 ray_utils.py:44]     self.model_runner.load_model()[32m [repeated 3x across cluster][0m
2024-06-09T01:17:36.936965279Z [36m(RayWorkerVllm pid=11138)[0m ERROR 06-09 01:17:35 ray_utils.py:44]     self.model = get_model([32m [repeated 3x across cluster][0m
2024-06-09T01:17:36.936966750Z [36m(RayWorkerVllm pid=11138)[0m ERROR 06-09 01:17:35 ray_utils.py:44]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader.py", line 101, in get_model[32m [repeated 3x across cluster][0m
2024-06-09T01:17:36.936968802Z [36m(RayWorkerVllm pid=11138)[0m ERROR 06-09 01:17:35 ray_utils.py:44]     model.load_weights(model_config.model, model_config.download_dir,[32m [repeated 3x across cluster][0m
2024-06-09T01:17:36.936970607Z [36m(RayWorkerVllm pid=11138)[0m ERROR 06-09 01:17:35 ray_utils.py:44]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/commandr.py", line 325, in load_weights[32m [repeated 3x across cluster][0m
2024-06-09T01:17:36.936972282Z [36m(RayWorkerVllm pid=11138)[0m ERROR 06-09 01:17:35 ray_utils.py:44]     param = params_dict[name][32m [repeated 3x across cluster][0m
2024-06-09T01:17:36.936974756Z [36m(RayWorkerVllm pid=11138)[0m ERROR 06-09 01:17:35 ray_utils.py:44] KeyError: 'model.layers.19.self_attn.k_norm.weight'[32m [repeated 3x across cluster][0m

yakazimir commented 1 month ago

There is a non-quantized version we could try, see note here: https://huggingface.co/CohereForAI/c4ai-command-r-plus

logikon-ai / cot-eval

Evaluate: CohereForAI/c4ai-command-r-plus #44