Open ggbetz opened 1 month ago
possible issue with VLLM and transformers:
2024-06-09T00:56:34.810614509Z Traceback (most recent call last):
2024-06-09T00:56:34.810640753Z File "/usr/local/bin/cot-eval", line 8, in <module>
2024-06-09T00:56:34.810670576Z sys.exit(main())
2024-06-09T00:56:34.810676624Z File "/workspace/cot-eval/src/cot_eval/__main__.py", line 149, in main
2024-06-09T00:56:34.810719213Z llm = VLLM(
2024-06-09T00:56:34.810737842Z File "/usr/local/lib/python3.10/dist-packages/langchain_core/load/serializable.py", line 120, in __init__
2024-06-09T00:56:34.810753544Z super().__init__(**kwargs)
2024-06-09T00:56:34.810762650Z File "/usr/local/lib/python3.10/dist-packages/pydantic/v1/main.py", line 339, in __init__
2024-06-09T00:56:34.810825823Z values, fields_set, validation_error = validate_model(__pydantic_self__.__class__, data)
2024-06-09T00:56:34.810833903Z File "/usr/local/lib/python3.10/dist-packages/pydantic/v1/main.py", line 1102, in validate_model
2024-06-09T00:56:34.810972709Z values = validator(cls_, values)
2024-06-09T00:56:34.810989335Z File "/usr/local/lib/python3.10/dist-packages/langchain_community/llms/vllm.py", line 88, in validate_environment
2024-06-09T00:56:34.810996428Z values["client"] = VLLModel(
2024-06-09T00:56:34.810998644Z File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/llm.py", line 112, in __init__
2024-06-09T00:56:34.811023086Z self.llm_engine = LLMEngine.from_engine_args(
2024-06-09T00:56:34.811028309Z File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 196, in from_engine_args
2024-06-09T00:56:34.811084375Z engine = cls(
2024-06-09T00:56:34.811087001Z File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 110, in __init__
2024-06-09T00:56:34.811088759Z self.model_executor = executor_class(model_config, cache_config,
2024-06-09T00:56:34.811091133Z File "/usr/local/lib/python3.10/dist-packages/vllm/executor/ray_gpu_executor.py", line 62, in __init__
2024-06-09T00:56:34.811126708Z self._init_workers_ray(placement_group)
2024-06-09T00:56:34.811132405Z File "/usr/local/lib/python3.10/dist-packages/vllm/executor/ray_gpu_executor.py", line 192, in _init_workers_ray
2024-06-09T00:56:34.811168916Z self._run_workers(
2024-06-09T00:56:34.811174082Z File "/usr/local/lib/python3.10/dist-packages/vllm/executor/ray_gpu_executor.py", line 324, in _run_workers
2024-06-09T00:56:34.811209017Z driver_worker_output = getattr(self.driver_worker,
2024-06-09T00:56:34.811214567Z File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 107, in load_model
2024-06-09T00:56:34.811242529Z self.model_runner.load_model()
2024-06-09T00:56:34.811247480Z File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 95, in load_model
2024-06-09T00:56:34.811256892Z self.model = get_model(
2024-06-09T00:56:34.811258664Z File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader.py", line 101, in get_model
2024-06-09T00:56:34.811294265Z model.load_weights(model_config.model, model_config.download_dir,
2024-06-09T00:56:34.811301734Z File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/falcon.py", line 425, in load_weights
2024-06-09T00:56:34.811364589Z param = params_dict[name]
2024-06-09T00:56:34.811377353Z KeyError: 'transformer.h.26.input_layernorm.weight'
2024-06-09T00:56:36.929909774Z [36m(RayWorkerVllm pid=10701)[0m ERROR 06-09 00:56:34 ray_utils.py:44] Error executing method load_model. This might cause deadlock in distributed execution.
2024-06-09T00:56:36.929931343Z [36m(RayWorkerVllm pid=10701)[0m ERROR 06-09 00:56:34 ray_utils.py:44] Traceback (most recent call last):
2024-06-09T00:56:36.929933284Z [36m(RayWorkerVllm pid=10701)[0m ERROR 06-09 00:56:34 ray_utils.py:44] File "/usr/local/lib/python3.10/dist-packages/vllm/engine/ray_utils.py", line 37, in execute_method
2024-06-09T00:56:36.929935669Z [36m(RayWorkerVllm pid=10701)[0m ERROR 06-09 00:56:34 ray_utils.py:44] return executor(*args, **kwargs)
2024-06-09T00:56:36.929937002Z [36m(RayWorkerVllm pid=10701)[0m ERROR 06-09 00:56:34 ray_utils.py:44] File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 107, in load_model
2024-06-09T00:56:36.929938537Z [36m(RayWorkerVllm pid=10701)[0m ERROR 06-09 00:56:34 ray_utils.py:44] self.model_runner.load_model()
2024-06-09T00:56:36.929939834Z [36m(RayWorkerVllm pid=10701)[0m ERROR 06-09 00:56:34 ray_utils.py:44] File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 95, in load_model
2024-06-09T00:56:36.929941274Z [36m(RayWorkerVllm pid=10701)[0m ERROR 06-09 00:56:34 ray_utils.py:44] self.model = get_model(
2024-06-09T00:56:36.929942571Z [36m(RayWorkerVllm pid=10701)[0m ERROR 06-09 00:56:34 ray_utils.py:44] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader.py", line 101, in get_model
2024-06-09T00:56:36.929944200Z [36m(RayWorkerVllm pid=10701)[0m ERROR 06-09 00:56:34 ray_utils.py:44] model.load_weights(model_config.model, model_config.download_dir,
2024-06-09T00:56:36.929945603Z [36m(RayWorkerVllm pid=10701)[0m ERROR 06-09 00:56:34 ray_utils.py:44] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/falcon.py", line 425, in load_weights
2024-06-09T00:56:36.929947310Z [36m(RayWorkerVllm pid=10701)[0m ERROR 06-09 00:56:34 ray_utils.py:44] param = params_dict[name]
2024-06-09T00:56:36.929948607Z [36m(RayWorkerVllm pid=10701)[0m ERROR 06-09 00:56:34 ray_utils.py:44] KeyError: 'transformer.h.26.input_layernorm.weight'
2024-06-09T00:56:36.929950064Z [36m(RayWorkerVllm pid=10931)[0m INFO 06-09 00:56:03 weight_utils.py:177] Using model weights format ['*.safetensors'][32m [repeated 2x across cluster][0m
2024-06-09T00:56:36.929952403Z [36m(RayWorkerVllm pid=10931)[0m ERROR 06-09 00:56:35 ray_utils.py:44] Error executing method load_model. This might cause deadlock in distributed execution.[32m [repeated 2x across cluster][0m
2024-06-09T00:56:36.929953926Z [36m(RayWorkerVllm pid=10931)[0m ERROR 06-09 00:56:35 ray_utils.py:44] Traceback (most recent call last):[32m [repeated 2x across cluster][0m
2024-06-09T00:56:36.929955302Z [36m(RayWorkerVllm pid=10931)[0m ERROR 06-09 00:56:35 ray_utils.py:44] File "/usr/local/lib/python3.10/dist-packages/vllm/engine/ray_utils.py", line 37, in execute_method[32m [repeated 2x across cluster][0m
2024-06-09T00:56:36.929964983Z [36m(RayWorkerVllm pid=10931)[0m ERROR 06-09 00:56:35 ray_utils.py:44] return executor(*args, **kwargs)[32m [repeated 2x across cluster][0m
2024-06-09T00:56:36.929966426Z [36m(RayWorkerVllm pid=10931)[0m ERROR 06-09 00:56:35 ray_utils.py:44] File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 95, in load_model[32m [repeated 4x across cluster][0m
2024-06-09T00:56:36.929968052Z [36m(RayWorkerVllm pid=10931)[0m ERROR 06-09 00:56:35 ray_utils.py:44] self.model_runner.load_model()[32m [repeated 2x across cluster][0m
2024-06-09T00:56:36.929969420Z [36m(RayWorkerVllm pid=10931)[0m ERROR 06-09 00:56:35 ray_utils.py:44] self.model = get_model([32m [repeated 2x across cluster][0m
2024-06-09T00:56:36.929970765Z [36m(RayWorkerVllm pid=10931)[0m ERROR 06-09 00:56:35 ray_utils.py:44] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader.py", line 101, in get_model[32m [repeated 2x across cluster][0m
2024-06-09T00:56:36.929972269Z [36m(RayWorkerVllm pid=10931)[0m ERROR 06-09 00:56:35 ray_utils.py:44] model.load_weights(model_config.model, model_config.download_dir,[32m [repeated 2x across cluster][0m
2024-06-09T00:56:36.929973771Z [36m(RayWorkerVllm pid=10931)[0m ERROR 06-09 00:56:35 ray_utils.py:44] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/falcon.py", line 425, in load_weights[32m [repeated 2x across cluster][0m
2024-06-09T00:56:36.929975266Z [36m(RayWorkerVllm pid=10931)[0m ERROR 06-09 00:56:35 ray_utils.py:44] param = params_dict[name][32m [repeated 2x across cluster][0m
2024-06-09T00:56:36.929976815Z [36m(RayWorkerVllm pid=10931)[0m ERROR 06-09 00:56:35 ray_utils.py:44] KeyError: 'transformer.h.26.input_layernorm.weight'[32m [repeated 2x across cluster][0m```
Check upon issue creation:
Parameters:
ToDos: