Closed Patrick10203 closed 1 month ago
This is similar to #8361. @Isotr0py can you look into this? I think the issue stems from different images potentially having different sizes even after postprocessing.
Seems that it's caused by different num_patches
from different image size, similar to #7392.
Your current environment
Environment was set up by pulling the main branch and building the Dockerfile. Hardware was 4xA100 with an Azure Instance (Standard NC96ads A100 v4). Server image is: ubuntu-hpc (2204)
Startup: python3 -m vllm.entrypoints.openai.api_server --port=8000 --host=0.0.0.0 --chat-template="/docker_share/models/internVL2-template.jinja" --model="/fine_tunes/internvl2_76b_hermes2_llama3_70b_dynamic_res_2nd_finetune" --tensor-parallel-size=4 --max-model-len=8192 --trust_remote_code --enforce-eager --max-lora-rank 128 --limit-mm-per-prompt image=4
🐛 Describe the bug
I have build from source with the current main branch to use online multi image inference with internVL2 76B (finetuned). First few inferences work with no issue. After like 10 calls the server crashes with following stack trace
The issue occurs when callen multithreaded and single threaded. Somehow the bug doesnt happen when i remove --max-lora-rank 128 and set --max-model-len=6000
Stack trace
```text ERROR 09-11 05:24:13 async_llm_engine.py:63] Engine background task failed ERROR 09-11 05:24:13 async_llm_engine.py:63] Traceback (most recent call last): ERROR 09-11 05:24:13 async_llm_engine.py:63] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/async_llm_engine.py", line 53, in _log_task_completion ERROR 09-11 05:24:13 async_llm_engine.py:63] return_value = task.result() ERROR 09-11 05:24:13 async_llm_engine.py:63] ^^^^^^^^^^^^^ ERROR 09-11 05:24:13 async_llm_engine.py:63] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/async_llm_engine.py", line 939, in run_engine_loop ERROR 09-11 05:24:13 async_llm_engine.py:63] result = task.result() ERROR 09-11 05:24:13 async_llm_engine.py:63] ^^^^^^^^^^^^^ ERROR 09-11 05:24:13 async_llm_engine.py:63] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/async_llm_engine.py", line 868, in engine_step ERROR 09-11 05:24:13 async_llm_engine.py:63] request_outputs = await self.engine.step_async(virtual_engine) ERROR 09-11 05:24:13 async_llm_engine.py:63] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 09-11 05:24:13 async_llm_engine.py:63] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/async_llm_engine.py", line 345, in step_async ERROR 09-11 05:24:13 async_llm_engine.py:63] outputs = await self.model_executor.execute_model_async( ERROR 09-11 05:24:13 async_llm_engine.py:63] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 09-11 05:24:13 async_llm_engine.py:63] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/distributed_gpu_executor.py", line 177, in execute_model_async ERROR 09-11 05:24:13 async_llm_engine.py:63] return await self._driver_execute_model_async(execute_model_req) ERROR 09-11 05:24:13 async_llm_engine.py:63] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 09-11 05:24:13 async_llm_engine.py:63] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/multiproc_gpu_executor.py", line 231, in _driver_execute_model_async ERROR 09-11 05:24:13 async_llm_engine.py:63] return await self.driver_exec_model(execute_model_req) ERROR 09-11 05:24:13 async_llm_engine.py:63] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 09-11 05:24:13 async_llm_engine.py:63] File "/usr/lib/python3.12/concurrent/futures/thread.py", line 58, in run ERROR 09-11 05:24:13 async_llm_engine.py:63] result = self.fn(*self.args, **self.kwargs) ERROR 09-11 05:24:13 async_llm_engine.py:63] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 09-11 05:24:13 async_llm_engine.py:63] File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker_base.py", line 303, in execute_model ERROR 09-11 05:24:13 async_llm_engine.py:63] inputs = self.prepare_input(execute_model_req) ERROR 09-11 05:24:13 async_llm_engine.py:63] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 09-11 05:24:13 async_llm_engine.py:63] File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker_base.py", line 291, in prepare_input ERROR 09-11 05:24:13 async_llm_engine.py:63] return self._get_driver_input_and_broadcast(execute_model_req) ERROR 09-11 05:24:13 async_llm_engine.py:63] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 09-11 05:24:13 async_llm_engine.py:63] File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker_base.py", line 253, in _get_driver_input_and_broadcast ERROR 09-11 05:24:13 async_llm_engine.py:63] self.model_runner.prepare_model_input( ERROR 09-11 05:24:13 async_llm_engine.py:63] File "/usr/local/lib/python3.12/dist-packages/vllm/worker/model_runner.py", line 1380, in prepare_model_input ERROR 09-11 05:24:13 async_llm_engine.py:63] model_input = self._prepare_model_input_tensors( ERROR 09-11 05:24:13 async_llm_engine.py:63] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 09-11 05:24:13 async_llm_engine.py:63] File "/usr/local/lib/python3.12/dist-packages/vllm/worker/model_runner.py", line 1038, in _prepare_model_input_tensors ERROR 09-11 05:24:13 async_llm_engine.py:63] builder.add_seq_group(seq_group_metadata) ERROR 09-11 05:24:13 async_llm_engine.py:63] File "/usr/local/lib/python3.12/dist-packages/vllm/worker/model_runner.py", line 664, in add_seq_group ERROR 09-11 05:24:13 async_llm_engine.py:63] per_seq_group_fn(inter_data, seq_group_metadata) ERROR 09-11 05:24:13 async_llm_engine.py:63] File "/usr/local/lib/python3.12/dist-packages/vllm/worker/model_runner.py", line 636, in _compute_multi_modal_input ERROR 09-11 05:24:13 async_llm_engine.py:63] mm_kwargs = self.multi_modal_input_mapper(mm_data) ERROR 09-11 05:24:13 async_llm_engine.py:63] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 09-11 05:24:13 async_llm_engine.py:63] File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/registry.py", line 125, in map_input ERROR 09-11 05:24:13 async_llm_engine.py:63] input_dict = plugin.map_input(model_config, data_value) ERROR 09-11 05:24:13 async_llm_engine.py:63] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 09-11 05:24:13 async_llm_engine.py:63] File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/base.py", line 265, in map_input ERROR 09-11 05:24:13 async_llm_engine.py:63] return mapper(InputContext(model_config), data) ERROR 09-11 05:24:13 async_llm_engine.py:63] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 09-11 05:24:13 async_llm_engine.py:63] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/internvl.py", line 279, in input_mapper_for_internvl ERROR 09-11 05:24:13 async_llm_engine.py:63] data = torch.stack(data) ERROR 09-11 05:24:13 async_llm_engine.py:63] ^^^^^^^^^^^^^^^^^ ERROR 09-11 05:24:13 async_llm_engine.py:63] RuntimeError: stack expects each tensor to be equal size, but got [7, 3, 448, 448] at entry 0 and [13, 3, 448, 448] at entry 1 Exception in callback functools.partial(Before submitting a new issue...