Open ggbetz opened 3 months ago
Is this 104 billion parameters
?
I fear so.
Maybe postpone until we have together.ai support, right?
yeh, I don't think it will be so easy to run.
I tried to run it on some h100s, should be fine, but probably there is a VLLM issue here:
2024-06-09T01:17:34.650526432Z Traceback (most recent call last):
2024-06-09T01:17:34.650550937Z File "/usr/local/bin/cot-eval", line 8, in <module>
2024-06-09T01:17:34.650625978Z sys.exit(main())
2024-06-09T01:17:34.650629280Z File "/workspace/cot-eval/src/cot_eval/__main__.py", line 149, in main
2024-06-09T01:17:34.650684105Z llm = VLLM(
2024-06-09T01:17:34.650688295Z File "/usr/local/lib/python3.10/dist-packages/langchain_core/load/serializable.py", line 120, in __init__
2024-06-09T01:17:34.650708283Z super().__init__(**kwargs)
2024-06-09T01:17:34.650710047Z File "/usr/local/lib/python3.10/dist-packages/pydantic/v1/main.py", line 339, in __init__
2024-06-09T01:17:34.650779281Z values, fields_set, validation_error = validate_model(__pydantic_self__.__class__, data)
2024-06-09T01:17:34.650781373Z File "/usr/local/lib/python3.10/dist-packages/pydantic/v1/main.py", line 1102, in validate_model
2024-06-09T01:17:34.650913230Z values = validator(cls_, values)
2024-06-09T01:17:34.650915258Z File "/usr/local/lib/python3.10/dist-packages/langchain_community/llms/vllm.py", line 88, in validate_environment
2024-06-09T01:17:34.650935652Z values["client"] = VLLModel(
2024-06-09T01:17:34.650937500Z File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/llm.py", line 112, in __init__
2024-06-09T01:17:34.650972176Z self.llm_engine = LLMEngine.from_engine_args(
2024-06-09T01:17:34.650974087Z File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 196, in from_engine_args
2024-06-09T01:17:34.651018637Z engine = cls(
2024-06-09T01:17:34.651020560Z File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 110, in __init__
2024-06-09T01:17:34.651040423Z self.model_executor = executor_class(model_config, cache_config,
2024-06-09T01:17:34.651042706Z File "/usr/local/lib/python3.10/dist-packages/vllm/executor/ray_gpu_executor.py", line 62, in __init__
2024-06-09T01:17:34.651070210Z self._init_workers_ray(placement_group)
2024-06-09T01:17:34.651072488Z File "/usr/local/lib/python3.10/dist-packages/vllm/executor/ray_gpu_executor.py", line 192, in _init_workers_ray
2024-06-09T01:17:34.651097110Z self._run_workers(
2024-06-09T01:17:34.651098955Z File "/usr/local/lib/python3.10/dist-packages/vllm/executor/ray_gpu_executor.py", line 324, in _run_workers
2024-06-09T01:17:34.651152655Z driver_worker_output = getattr(self.driver_worker,
2024-06-09T01:17:34.651154577Z File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 107, in load_model
2024-06-09T01:17:34.651173302Z self.model_runner.load_model()
2024-06-09T01:17:34.651174918Z File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 95, in load_model
2024-06-09T01:17:34.651205321Z self.model = get_model(
2024-06-09T01:17:34.651206893Z File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader.py", line 101, in get_model
2024-06-09T01:17:34.651226346Z model.load_weights(model_config.model, model_config.download_dir,
2024-06-09T01:17:34.651228052Z File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/commandr.py", line 325, in load_weights
2024-06-09T01:17:34.651286187Z param = params_dict[name]
2024-06-09T01:17:34.651290524Z KeyError: 'model.layers.19.self_attn.k_norm.weight'
2024-06-09T01:17:36.936923543Z [36m(RayWorkerVllm pid=11138)[0m ERROR 06-09 01:17:35 ray_utils.py:44] Error executing method load_model. This might cause deadlock in distributed execution.[32m [repeated 3x across cluster][0m
2024-06-09T01:17:36.936946060Z [36m(RayWorkerVllm pid=11138)[0m ERROR 06-09 01:17:35 ray_utils.py:44] Traceback (most recent call last):[32m [repeated 3x across cluster][0m
2024-06-09T01:17:36.936948507Z [36m(RayWorkerVllm pid=11138)[0m ERROR 06-09 01:17:35 ray_utils.py:44] File "/usr/local/lib/python3.10/dist-packages/vllm/engine/ray_utils.py", line 37, in execute_method[32m [repeated 3x across cluster][0m
2024-06-09T01:17:36.936959586Z [36m(RayWorkerVllm pid=11138)[0m ERROR 06-09 01:17:35 ray_utils.py:44] return executor(*args, **kwargs)[32m [repeated 3x across cluster][0m
2024-06-09T01:17:36.936961715Z [36m(RayWorkerVllm pid=11138)[0m ERROR 06-09 01:17:35 ray_utils.py:44] File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 95, in load_model[32m [repeated 6x across cluster][0m
2024-06-09T01:17:36.936963598Z [36m(RayWorkerVllm pid=11138)[0m ERROR 06-09 01:17:35 ray_utils.py:44] self.model_runner.load_model()[32m [repeated 3x across cluster][0m
2024-06-09T01:17:36.936965279Z [36m(RayWorkerVllm pid=11138)[0m ERROR 06-09 01:17:35 ray_utils.py:44] self.model = get_model([32m [repeated 3x across cluster][0m
2024-06-09T01:17:36.936966750Z [36m(RayWorkerVllm pid=11138)[0m ERROR 06-09 01:17:35 ray_utils.py:44] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader.py", line 101, in get_model[32m [repeated 3x across cluster][0m
2024-06-09T01:17:36.936968802Z [36m(RayWorkerVllm pid=11138)[0m ERROR 06-09 01:17:35 ray_utils.py:44] model.load_weights(model_config.model, model_config.download_dir,[32m [repeated 3x across cluster][0m
2024-06-09T01:17:36.936970607Z [36m(RayWorkerVllm pid=11138)[0m ERROR 06-09 01:17:35 ray_utils.py:44] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/commandr.py", line 325, in load_weights[32m [repeated 3x across cluster][0m
2024-06-09T01:17:36.936972282Z [36m(RayWorkerVllm pid=11138)[0m ERROR 06-09 01:17:35 ray_utils.py:44] param = params_dict[name][32m [repeated 3x across cluster][0m
2024-06-09T01:17:36.936974756Z [36m(RayWorkerVllm pid=11138)[0m ERROR 06-09 01:17:35 ray_utils.py:44] KeyError: 'model.layers.19.self_attn.k_norm.weight'[32m [repeated 3x across cluster][0m
There is a non-quantized version we could try, see note here: https://huggingface.co/CohereForAI/c4ai-command-r-plus
Check upon issue creation:
Parameters:
ToDos: