vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
29.4k stars 4.41k forks source link

[Bug]: seq_group_metadata.encoder_seq_data.get_len() AttributeError: 'NoneType' object has no attribute 'get_len' #9878

Open bingwork opened 2 hours ago

bingwork commented 2 hours ago

Your current environment

Due to network isolation, I am currently unable to run scripts. I use 8* h100 80G the run command vllm serve /models/Llama-3.2-90B-Vision-Instruct/ --dtype auto --tensor_parallel_size 8 --max-num-seqs 32 --enforce-eager --gpu_memory_utilization 0.95 --max_model_len 8192 --max_seq_len_to_capture 8192 --speculative_model "[ngram]" --num_speculative_tokens 5 --ngram_prompt_lookup_max 4 --use_v2_block_manager

Model Input Dumps

No response

🐛 Describe the bug

(VllmWorkerProcess pid=489) ERROR 10-31 08:09:05 multiproc_worker_utils.py:229] Exception in worker VllmWorkerProcess while processing method determine_num_available_blocks. (VllmWorkerProcess pid=489) ERROR 10-31 08:09:05 multiproc_worker_utils.py:229] Traceback (most recent call last): (VllmWorkerProcess pid=489) ERROR 10-31 08:09:05 multiproc_worker_utils.py:229] File "/root/anaconda3/lib/python3.9/site-packages/vllm/executor/multiproc_worker_utils.py", line 223, in _run_worker_process (VllmWorkerProcess pid=489) ERROR 10-31 08:09:05 multiproc_worker_utils.py:229] output = executor(*args, kwargs) (VllmWorkerProcess pid=489) ERROR 10-31 08:09:05 multiproc_worker_utils.py:229] File "/root/anaconda3/lib/python3.9/site-packages/vllm/spec_decode/spec_decode_worker.py", line 361, in determine_num_available_blocks (VllmWorkerProcess pid=489) ERROR 10-31 08:09:05 multiproc_worker_utils.py:229] self.scorer_worker.determine_num_available_blocks()) (VllmWorkerProcess pid=489) ERROR 10-31 08:09:05 multiproc_worker_utils.py:229] File "/root/anaconda3/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context (VllmWorkerProcess pid=489) ERROR 10-31 08:09:05 multiproc_worker_utils.py:229] return func(*args, *kwargs) (VllmWorkerProcess pid=489) ERROR 10-31 08:09:05 multiproc_worker_utils.py:229] File "/root/anaconda3/lib/python3.9/site-packages/vllm/worker/worker.py", line 223, in determine_num_available_blocks (VllmWorkerProcess pid=489) ERROR 10-31 08:09:05 multiproc_worker_utils.py:229] self.model_runner.profile_run() (VllmWorkerProcess pid=489) ERROR 10-31 08:09:05 multiproc_worker_utils.py:229] File "/root/anaconda3/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context (VllmWorkerProcess pid=489) ERROR 10-31 08:09:05 multiproc_worker_utils.py:229] return func(args, kwargs) (VllmWorkerProcess pid=489) ERROR 10-31 08:09:05 multiproc_worker_utils.py:229] File "/root/anaconda3/lib/python3.9/site-packages/vllm/worker/model_runner.py", line 1289, in profile_run (VllmWorkerProcess pid=489) ERROR 10-31 08:09:05 multiproc_worker_utils.py:229] model_input = self.prepare_model_input( (VllmWorkerProcess pid=489) ERROR 10-31 08:09:05 multiproc_worker_utils.py:229] File "/root/anaconda3/lib/python3.9/site-packages/vllm/spec_decode/target_model_runner.py", line 60, in prepare_model_input (VllmWorkerProcess pid=489) ERROR 10-31 08:09:05 multiproc_worker_utils.py:229] model_input: ModelInputForGPUWithSamplingMetadata = super( (VllmWorkerProcess pid=489) ERROR 10-31 08:09:05 multiproc_worker_utils.py:229] File "/root/anaconda3/lib/python3.9/site-packages/vllm/worker/model_runner.py", line 1586, in prepare_model_input (VllmWorkerProcess pid=489) ERROR 10-31 08:09:05 multiproc_worker_utils.py:229] model_input = self._prepare_model_input_tensors( (VllmWorkerProcess pid=489) ERROR 10-31 08:09:05 multiproc_worker_utils.py:229] File "/root/anaconda3/lib/python3.9/site-packages/vllm/worker/model_runner.py", line 1192, in _prepare_model_input_tensors (VllmWorkerProcess pid=489) ERROR 10-31 08:09:05 multiproc_worker_utils.py:229] builder.add_seq_group(seq_group_metadata) (VllmWorkerProcess pid=489) ERROR 10-31 08:09:05 multiproc_worker_utils.py:229] File "/root/anaconda3/lib/python3.9/site-packages/vllm/worker/model_runner.py", line 693, in add_seq_group (VllmWorkerProcess pid=489) ERROR 10-31 08:09:05 multiproc_worker_utils.py:229] encoder_seq_len = seq_group_metadata.encoder_seq_data.get_len() (VllmWorkerProcess pid=489) ERROR 10-31 08:09:05 multiproc_worker_utils.py:229] AttributeError: 'NoneType' object has no attribute 'get_len'

Before submitting a new issue...

bingwork commented 2 hours ago

same error when running the command below: vllm serve /models/Llama-3.2-90B-Vision-Instruct/ --dtype auto --tensor_parallel_size 8 --max-num-seqs 32 --enforce-eager --gpu_memory_utilization 0.95 --max_model_len 8192 --max_seq_len_to_capture 8192 --speculative_model /models/Llama-3.2-11B-Vision-Instruct/ --num_speculative_tokens 5 --use_v2_block_manager --speculative_draft_tensor_parallel_size 1