intelligent-machine-learning / glake

GLake: optimizing GPU memory management and IO transmission.
Apache License 2.0
351 stars 32 forks source link

glakeServe run failed:TypeError: FlashAttentionMetadata.__init__() got an unexpected keyword argument 'is_extend' #27

Closed Rainlin007 closed 2 weeks ago

Rainlin007 commented 2 weeks ago

follow the guide and run glakeServe:

ERROR 09-02 02:13:01 worker_base.py:145] Error executing method determine_num_available_blocks. This might cause deadlock in distributed execution. ERROR 09-02 02:13:01 worker_base.py:145] Traceback (most recent call last): ERROR 09-02 02:13:01 worker_base.py:145] File "/usr/local/lib/python3.10/dist-packages/vllm-0.4.2+cu123-py3.10-linux-x86_64.egg/vllm/worker/worker_base.py", line 137, in execute_method ERROR 09-02 02:13:01 worker_base.py:145] return executor(*args, kwargs) ERROR 09-02 02:13:01 worker_base.py:145] File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context ERROR 09-02 02:13:01 worker_base.py:145] return func(*args, *kwargs) ERROR 09-02 02:13:01 worker_base.py:145] File "/usr/local/lib/python3.10/dist-packages/vllm-0.4.2+cu123-py3.10-linux-x86_64.egg/vllm/worker/worker.py", line 157, in determine_num_available_blocks ERROR 09-02 02:13:01 worker_base.py:145] self.model_runner.profile_run() ERROR 09-02 02:13:01 worker_base.py:145] File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context ERROR 09-02 02:13:01 worker_base.py:145] return func(args, kwargs) ERROR 09-02 02:13:01 worker_base.py:145] File "/usr/local/lib/python3.10/dist-packages/vllm-0.4.2+cu123-py3.10-linux-x86_64.egg/vllm/worker/model_runner.py", line 847, in profile_run ERROR 09-02 02:13:01 worker_base.py:145] self.execute_model(seqs, kv_caches, []) ERROR 09-02 02:13:01 worker_base.py:145] File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context ERROR 09-02 02:13:01 worker_base.py:145] return func(*args, *kwargs) ERROR 09-02 02:13:01 worker_base.py:145] File "/usr/local/lib/python3.10/dist-packages/vllm-0.4.2+cu123-py3.10-linux-x86_64.egg/vllm/worker/model_runner.py", line 735, in execute_model ERROR 09-02 02:13:01 worker_base.py:145] ) = self.prepare_input_tensors(seq_group_metadata_list, extend_list) ERROR 09-02 02:13:01 worker_base.py:145] File "/usr/local/lib/python3.10/dist-packages/vllm-0.4.2+cu123-py3.10-linux-x86_64.egg/vllm/worker/model_runner.py", line 672, in prepare_input_tensors ERROR 09-02 02:13:01 worker_base.py:145] ) = self._prepare_model_input(seq_group_metadata_list, extend_list) ERROR 09-02 02:13:01 worker_base.py:145] File "/usr/local/lib/python3.10/dist-packages/vllm-0.4.2+cu123-py3.10-linux-x86_64.egg/vllm/worker/model_runner.py", line 598, in _prepare_model_input ERROR 09-02 02:13:01 worker_base.py:145] attn_metadata = self.attn_backend.make_metadata( ERROR 09-02 02:13:01 worker_base.py:145] File "/usr/local/lib/python3.10/dist-packages/vllm-0.4.2+cu123-py3.10-linux-x86_64.egg/vllm/attention/backends/flash_attn.py", line 31, in make_metadata ERROR 09-02 02:13:01 worker_base.py:145] return FlashAttentionMetadata(args, **kwargs) ERROR 09-02 02:13:01 worker_base.py:145] TypeError: FlashAttentionMetadata.init() got an unexpected keyword argument 'is_extend'

Rainlin007 commented 2 weeks ago

should I select FlashAttention-2 backend or xformer backend,when select FlashAttention backend , there is error above, but xformer is ok