Open sayanbiswas59 opened 2 months ago
Which version of vLLM are you using? We fixed an error in the calculation a while back (#6982).
It would be great if you could provide the dimensions of the images which caused the error.
We are using vLLM 0.5.4 and we are running batch inferences on the image bytes when we get the above error. Do the images need to be of certain size for the feature size calculation to work properly?
In addition, we have also observed an issue where, if an exception arises while executing an item in the batch, the pending items from the current batch accumulate in the next batch of the succeeding task. This causes subsequent tasks to fail due to overflow.
We are trying to find a solution that allows us to skip the problematic item in the batch and proceed with processing the remaining items. While we have considered skipping the entire batch if an exception occurs, this does not resolve the overflow issue.
Below is an example of the Actor log for which we observe the batch overflow when there is an exception. Batch size = 1000 (used for this run) Exception occurs while processing the task ---> 125/960 and then it starts to overflow.
Processed prompts: 0%| | 0/960 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
Processed prompts: 0%| | 1/960 [00:47<12:33:23, 47.14s/it, est. speed input: 23.23 toks/s, output: 0.85 toks/s]
Processed prompts: 1%| | 10/960 [00:48<56:54, 3.59s/it, est. speed input: 476.70 toks/s, output: 8.18 toks/s]
Processed prompts: 13%|█▎ | 125/960 [01:58<13:11, 1.05it/s, est. speed input: 7946.17 toks/s, output: 104.40 toks/s]
Processed prompts: 0%| | 0/1603 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
Processed prompts: 0%| | 0/1603 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
Processed prompts: 0%| | 0/2371 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
Processed prompts: 0%| | 0/2371 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
Processed prompts: 0%| | 0/3139 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
Processed prompts: 0%| | 0/3139 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
Processed prompts: 0%| | 0/4123 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
Processed prompts: 0%| | 0/4123 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
.
.
.
Processed prompts: 0%| | 0/1377874 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
Processed prompts: 0%| | 0/1377874 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
Currently we are blocked due to the above errors and as it does not let us scale.
Does the same error occur if you use a smaller batch size? I think having a large batch size may run into the issue of reaching the max_model_len
, which truncates the remaining image placeholder tokens and thus cause this error.
Yes, we have tried with even smaller batch sizes of 250 and the same issue persists.
It would be great if you could provide the dimensions of the images which caused the error.
This would aid us greatly in debugging the issue.
While we try to get the dimension of the images in the batch, could you please let us know how can we get around the second issue if there is an exception? The batch overflows and it ultimately it leads to job failure.
Can we abort/discard the batch when it leads to an exception or is there a way to skip the problematic item in the batch and proceed with processing the remaining items in the batch?
Can you share any sample data so we can reproduce?
same error, and set max_model_len
is no help. my input image shape is (217,232), only one pair prompt-image input
same error, and set
max_model_len
is no help. my input image shape is (217,232), only one pair prompt-image input
Resolved.
same error, and set
max_model_len
is no help. my input image shape is (217,232), only one pair prompt-image inputResolved.
Could you elaborate?
same error, and set
max_model_len
is no help. my input image shape is (217,232), and only one pair prompt-image inputResolved.
Could you share how to solve?
same error, and set
max_model_len
is no help. my input image shape is (217,232), only one pair prompt-image inputResolved.
Could you share the resolution please? Many of us seems to have of the same
Your current environment
🐛 Describe the bug
We are attempting to utilize Ray v2.23 for batch inferencing, specifically on multi-modal data, by leveraging llava-next.
An error that we observe while executing the inference code for a batch size of 500 is as shown below: