vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
26.09k stars 3.82k forks source link

Runtime exception [step must be nonzero] #2933

Open DreamGenX opened 6 months ago

DreamGenX commented 6 months ago

Somehow max_prompt_len may be 0 in this code: https://github.com/vllm-project/vllm/blob/264017a2bf030f060ebad91eb9be9b4e0033edb9/vllm/worker/model_runner.py#L232

    |   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 29, in _raise_exception_on_finish           [32/1990]
    |     task.result()                                                                                                                           
    |   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 411, in run_engine_loop                              
    |     has_requests_in_progress = await self.engine_step()                                                                                     
    |   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 390, in engine_step                                  
    |     request_outputs = await self.engine.step_async()                                                                                        
    |   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 189, in step_async                                   
    |     all_outputs = await self._run_workers_async(                                                                                            
    |   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 276, in _run_workers_async                           
    |     all_outputs = await asyncio.gather(*coros)                                                                                              
    |   File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run                                                                  
    |     result = self.fn(*self.args, **self.kwargs)                                                                                             
    |   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context                                  
    |     return func(*args, **kwargs)                                                                                                            
    |   File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 225, in execute_model                                          
    |     output = self.model_runner.execute_model(seq_group_metadata_list,                                                                       
    |   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context                                  
    |     return func(*args, **kwargs)                                                                                                            
    |   File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 553, in execute_model                                    
    |     lora_mapping) = self.prepare_input_tensors(seq_group_metadata_list)                                                                     
    |   File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 473, in prepare_input_tensors                            
    |     lora_requests) = self._prepare_prompt(seq_group_metadata_list)                                                                          
    |   File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 232, in _prepare_prompt                                  
    |     start_loc_tensor = torch.arange(0,                                                                                                      
    | RuntimeError: step must be nonzero         
lzhfe commented 6 months ago

I encountered the same problem. model: qwen-72b-chat-int4 vllm: 0.3.1

lzhfe commented 6 months ago

I solved it. 'Cause I passed in an empy prompt by mistake.

NaCloudAI commented 6 months ago

I can confirm this issue exists even when input is not zero.

following is my payload

@WoosukKwon


curl -X 'POST' \
  'https://xxxxxxxxxxxxx.net/v1/completions' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "model": "test/7b",
  "prompt": "abc",
  "max_tokens": 16,
  "temperature": 1,
  "top_p": 0.36,
  "stream": false,
  "top_k": 20,
  "ignore_eos": false,
  "use_beam_search": false,
  "stop_token_ids": [
    0
  ],
  "skip_special_tokens": true,
  "spaces_between_special_tokens": true,
  "repetition_penalty": 1,
  "min_p": 0,
  "include_stop_str_in_output": false,
  "length_penalty": 1
}'
theobjectivedad commented 4 months ago

+1