vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
29.88k stars 4.51k forks source link

[Bug]: Mismatch in the number of image tokens and placeholders during batch inference #7669

Open sayanbiswas59 opened 2 months ago

sayanbiswas59 commented 2 months ago

Your current environment

Ray v2.23
Python 3.10
vllm 0.5.4
cuda 12.1

🐛 Describe the bug

We are attempting to utilize Ray v2.23 for batch inferencing, specifically on multi-modal data, by leveraging llava-next.

dataset = ray.data.read_parquet(gcsInputPath, columns=columns)
class LLMPredictor:

    def __init__(self):
        # Create an LLM.
        self.llm = LLM(model="/mnt/models",
                       tensor_parallel_size=1)

    def __call__(self, batch: Dict[str, np.ndarray]) -> Dict[str, list]:

        try:
            start_time = time.time()

            prompts = [{"prompt": prompt, "multi_modal_data": {
                "image": Image.open(io.BytesIO(base64.b64decode(batch[imageColumnName][i])))}} for i in
                       range(len(batch[imageColumnName]))]

            predictions = self.llm.generate(
                prompts, sampling_params=sampling_params)
            batch["generated_output"] = [preds.outputs[0].text for preds in predictions]
            end_time = time.time()
            print(f'Total Inference Time for {len(prompts)} - {end_time - start_time}')

        except OSError as os_error:
            print(f"OS error: {os_error}")
            batch["generated_output"] = ["" for _ in range(len(batch[imageColumnName]))]

        except Exception as error:
            print(f"Misc error: {error}")
            batch["generated_output"] = ["" for _ in range(len(batch[imageColumnName]))]

        finally:
            del batch['image_bytes']
            return batch

dataset = dataset.map_batches(
    LLMPredictor,
    concurrency=int(workers) * int(gpus),
    batch_size=int(batchSize),
    num_gpus=1
)

dataset.write_parquet(gcsOutputPath)

An error that we observe while executing the inference code for a batch size of 500 is as shown below:

Total Inference Time for 480 - 164.62883067131042
Batch Size is : 299 
Misc error: Attempted to assign 2928 + 2928 + 1968 + 2928 + 2064 + 2256 + 2928 + 2928 + 1968 + 2928 + 2928 = 28752 image tokens to 28848 placeholders
DarkLight1337 commented 2 months ago

Which version of vLLM are you using? We fixed an error in the calculation a while back (#6982).

It would be great if you could provide the dimensions of the images which caused the error.

sayanbiswas59 commented 2 months ago

We are using vLLM 0.5.4 and we are running batch inferences on the image bytes when we get the above error. Do the images need to be of certain size for the feature size calculation to work properly?

In addition, we have also observed an issue where, if an exception arises while executing an item in the batch, the pending items from the current batch accumulate in the next batch of the succeeding task. This causes subsequent tasks to fail due to overflow.

We are trying to find a solution that allows us to skip the problematic item in the batch and proceed with processing the remaining items. While we have considered skipping the entire batch if an exception occurs, this does not resolve the overflow issue.

Below is an example of the Actor log for which we observe the batch overflow when there is an exception. Batch size = 1000 (used for this run) Exception occurs while processing the task ---> 125/960 and then it starts to overflow.

Processed prompts:   0%|          | 0/960 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
Processed prompts:   0%|          | 1/960 [00:47<12:33:23, 47.14s/it, est. speed input: 23.23 toks/s, output: 0.85 toks/s]
Processed prompts:   1%|          | 10/960 [00:48<56:54,  3.59s/it, est. speed input: 476.70 toks/s, output: 8.18 toks/s] 
Processed prompts:  13%|█▎        | 125/960 [01:58<13:11,  1.05it/s, est. speed input: 7946.17 toks/s, output: 104.40 toks/s]

Processed prompts:   0%|          | 0/1603 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
Processed prompts:   0%|          | 0/1603 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]

Processed prompts:   0%|          | 0/2371 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
Processed prompts:   0%|          | 0/2371 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]

Processed prompts:   0%|          | 0/3139 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
Processed prompts:   0%|          | 0/3139 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]

Processed prompts:   0%|          | 0/4123 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
Processed prompts:   0%|          | 0/4123 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
.
.
.

Processed prompts:   0%|          | 0/1377874 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
Processed prompts:   0%|          | 0/1377874 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]

Currently we are blocked due to the above errors and as it does not let us scale.

DarkLight1337 commented 2 months ago

Does the same error occur if you use a smaller batch size? I think having a large batch size may run into the issue of reaching the max_model_len, which truncates the remaining image placeholder tokens and thus cause this error.

sayanbiswas59 commented 2 months ago

Yes, we have tried with even smaller batch sizes of 250 and the same issue persists.

DarkLight1337 commented 2 months ago

It would be great if you could provide the dimensions of the images which caused the error.

This would aid us greatly in debugging the issue.

sayanbiswas59 commented 2 months ago

While we try to get the dimension of the images in the batch, could you please let us know how can we get around the second issue if there is an exception? The batch overflows and it ultimately it leads to job failure.

Can we abort/discard the batch when it leads to an exception or is there a way to skip the problematic item in the batch and proceed with processing the remaining items in the batch?

robertgshaw2-neuralmagic commented 2 months ago

Can you share any sample data so we can reproduce?

Jeremy-J-J commented 2 months ago

same error, and set max_model_len is no help. my input image shape is (217,232), only one pair prompt-image input

Jeremy-J-J commented 2 months ago

same error, and set max_model_len is no help. my input image shape is (217,232), only one pair prompt-image input

Resolved.

DarkLight1337 commented 2 months ago

same error, and set max_model_len is no help. my input image shape is (217,232), only one pair prompt-image input

Resolved.

Could you elaborate?

drizzle0171 commented 2 months ago

same error, and set max_model_len is no help. my input image shape is (217,232), and only one pair prompt-image input

Resolved.

Could you share how to solve?

luizanao commented 2 months ago

same error, and set max_model_len is no help. my input image shape is (217,232), only one pair prompt-image input

Resolved.

Could you share the resolution please? Many of us seems to have of the same