lm-sys / FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Apache License 2.0
36.95k stars 4.56k forks source link

Aya-101 Killing VLLM #3050

Open surak opened 9 months ago

surak commented 9 months ago

This is the VLLM worker on Aya-101:

2024-02-15 16:14:20 | ERROR | stderr | Traceback (most recent call last):
2024-02-15 16:14:20 | ERROR | stderr |   File "/p/haicluster/llama/FastChat/fastchat/serve/vllm_worker.py", line 290, in <module>
2024-02-15 16:14:20 | ERROR | stderr |     engine = AsyncLLMEngine.from_engine_args(engine_args)
2024-02-15 16:14:20 | ERROR | stderr |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-02-15 16:14:20 | ERROR | stderr |   File "/p/haicluster/llama/FastChat/sc_venv_2024/venv/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 495, in from_engine_args
2024-02-15 16:14:20 | ERROR | stderr |     engine = cls(parallel_config.worker_use_ray,
2024-02-15 16:14:20 | ERROR | stderr |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-02-15 16:14:20 | ERROR | stderr |   File "/p/haicluster/llama/FastChat/sc_venv_2024/venv/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 269, in __init__
2024-02-15 16:14:20 | ERROR | stderr |     self.engine = self._init_engine(*args, **kwargs)
2024-02-15 16:14:20 | ERROR | stderr |                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-02-15 16:14:20 | ERROR | stderr |   File "/p/haicluster/llama/FastChat/sc_venv_2024/venv/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 314, in _init_engine
2024-02-15 16:14:20 | ERROR | stderr |     return engine_class(*args, **kwargs)
2024-02-15 16:14:20 | ERROR | stderr |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-02-15 16:14:20 | ERROR | stderr |   File "/p/haicluster/llama/FastChat/sc_venv_2024/venv/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 107, in __init__
2024-02-15 16:14:20 | ERROR | stderr |     self._init_workers_ray(placement_group)
2024-02-15 16:14:20 | ERROR | stderr |   File "/p/haicluster/llama/FastChat/sc_venv_2024/venv/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 194, in _init_workers_ray
2024-02-15 16:14:20 | ERROR | stderr |     self._run_workers(
2024-02-15 16:14:20 | ERROR | stderr |   File "/p/haicluster/llama/FastChat/sc_venv_2024/venv/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 750, in _run_workers
2024-02-15 16:14:20 | ERROR | stderr |     self._run_workers_in_batch(workers, method, *args, **kwargs))
2024-02-15 16:14:20 | ERROR | stderr |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-02-15 16:14:20 | ERROR | stderr |   File "/p/haicluster/llama/FastChat/sc_venv_2024/venv/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 727, in _run_workers_in_batch
2024-02-15 16:14:20 | ERROR | stderr |     all_outputs = ray.get(all_outputs)
2024-02-15 16:14:20 | ERROR | stderr |                   ^^^^^^^^^^^^^^^^^^^^
2024-02-15 16:14:20 | ERROR | stderr |   File "/p/haicluster/llama/FastChat/sc_venv_2024/venv/lib/python3.11/site-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
2024-02-15 16:14:20 | ERROR | stderr |     return fn(*args, **kwargs)
2024-02-15 16:14:20 | ERROR | stderr |            ^^^^^^^^^^^^^^^^^^^
2024-02-15 16:14:20 | ERROR | stderr |   File "/p/haicluster/llama/FastChat/sc_venv_2024/venv/lib/python3.11/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
2024-02-15 16:14:20 | ERROR | stderr |     return func(*args, **kwargs)
2024-02-15 16:14:20 | ERROR | stderr |            ^^^^^^^^^^^^^^^^^^^^^
2024-02-15 16:14:20 | ERROR | stderr |   File "/p/haicluster/llama/FastChat/sc_venv_2024/venv/lib/python3.11/site-packages/ray/_private/worker.py", line 2624, in get
2024-02-15 16:14:20 | ERROR | stderr |     raise value.as_instanceof_cause()
2024-02-15 16:14:20 | ERROR | stderr | ray.exceptions.RayTaskError(ValueError): ray::RayWorkerVllm.execute_method() (pid=1550269, ip=134.94.1.44, actor_id=4803c3335806647426226c9501000000, repr=<vllm.engine.ray_utils.RayWorkerVllm object at 0x7f2dd969ee50>)
2024-02-15 16:14:20 | ERROR | stderr |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-02-15 16:14:20 | ERROR | stderr |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-02-15 16:14:20 | ERROR | stderr |   File "/p/haicluster/llama/FastChat/sc_venv_2024/venv/lib/python3.11/site-packages/vllm/engine/ray_utils.py", line 32, in execute_method
2024-02-15 16:14:20 | ERROR | stderr |     return executor(*args, **kwargs)
2024-02-15 16:14:20 | ERROR | stderr |            ^^^^^^^^^^^^^^^^^^^^^^^^^
2024-02-15 16:14:20 | ERROR | stderr |   File "/p/haicluster/llama/FastChat/sc_venv_2024/venv/lib/python3.11/site-packages/vllm/worker/worker.py", line 72, in load_model
2024-02-15 16:14:20 | ERROR | stderr |     self.model_runner.load_model()
2024-02-15 16:14:20 | ERROR | stderr |   File "/p/haicluster/llama/FastChat/sc_venv_2024/venv/lib/python3.11/site-packages/vllm/worker/model_runner.py", line 36, in load_model
2024-02-15 16:14:20 | ERROR | stderr |     self.model = get_model(self.model_config)
2024-02-15 16:14:20 | ERROR | stderr |                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-02-15 16:14:20 | ERROR | stderr |   File "/p/haicluster/llama/FastChat/sc_venv_2024/venv/lib/python3.11/site-packages/vllm/model_executor/model_loader.py", line 88, in get_model
2024-02-15 16:14:20 | ERROR | stderr |     model_class = _get_model_architecture(model_config.hf_config)
2024-02-15 16:14:20 | ERROR | stderr |                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-02-15 16:14:20 | ERROR | stderr |   File "/p/haicluster/llama/FastChat/sc_venv_2024/venv/lib/python3.11/site-packages/vllm/model_executor/model_loader.py", line 82, in _get_model_architecture
2024-02-15 16:14:20 | ERROR | stderr |     raise ValueError(
2024-02-15 16:14:20 | ERROR | stderr | ValueError: Model architectures ['T5ForConditionalGeneration'] are not supported for now. Supported architectures: ['AquilaModel', 'AquilaForCausalLM', 'BaiChuanForCausalLM', 'BaichuanForCausalLM', 'BloomForCausalLM', 'ChatGLMModel', 'ChatGLMForConditionalGeneration', 'FalconForCausalLM', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTJForCausalLM', 'GPTNeoXForCausalLM', 'InternLMForCausalLM', 'LlamaForCausalLM', 'LLaMAForCausalLM', 'MistralForCausalLM', 'MixtralForCausalLM', 'MptForCausalLM', 'MPTForCausalLM', 'OPTForCausalLM', 'PhiForCausalLM', 'QWenLMHeadModel', 'RWForCausalLM', 'YiForCausalLM']
2024-02-15 16:14:21 | INFO | stdout | (RayWorkerVllm pid=1550268) MegaBlocks not found. Please install it by `pip install megablocks`. Note that MegaBlocks depends on mosaicml-turbo, which only supports Python 3.10 for now.
2024-02-15 16:14:21 | INFO | stdout | (RayWorkerVllm pid=1550268) STK not found: please see https://github.com/stanford-futuredata/stk
srun: error: haicluster1: task 0: Exited with exit code 1
thiner commented 8 months ago

The problem is that Aya101 bases on a different architecture,

Architecture: Same as mt5-xxl

while most of other models are based on CausalLM.