vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
26.77k stars 3.92k forks source link

[Bug]: : ERROR 07-31 11:57:33 async_llm_engine.py:658] Engine iteration timed out. This should never happen! #6969

Open lucasjinreal opened 1 month ago

lucasjinreal commented 1 month ago

Your current environment

: ERROR 07-31 11:57:33 async_llm_engine.py:658] Engine iteration timed out. This should never happen!```

### 🐛 Describe the bug

This code not work:

sampling_params = SamplingParams(temperature=0.2, max_tokens=1024, stop=["<|im_start|>", "<|im_end|>"], skip_special_tokens=False)

while True:
    try:
        inp = input(f"{roles[0]}: ")
    except EOFError:
        inp = ""
    if inp == "\\d":
        print("exit...")
        break

    if is_image(inp):
        images = load_multi_images_maybe(inp)
        # clear conv history
        conv.messages = []
        print("Updated image, start new chat session.")
        continue

    print(f"{roles[1]}: ", end="")

    if images is not None:
        # first message
        inp = f"{DEFAULT_IMAGE_TOKEN} " * len(images) + "\n" + inp
        if len(images) > 1:
            inp = convert_image_tags(inp)
        conv.append_message(conv.roles[0], inp)
    else:
        # later messages
        conv.append_message(conv.roles[0], inp)
    conv.append_message(conv.roles[1], None)
    prompt = conv.get_prompt()

    if args.debug:
        print(f"prompt_real: {prompt}")

    if images is not None:
        inputs = {
            "prompt": prompt,
            "multi_modal_data": {
                "image": images[0]
            },
        }
        # next don't need it
        images = None
    else:
        inputs = {
            "prompt": prompt,
        }

    results_generator = model.generate(inputs, sampling_params, str(time.monotonic()))
    outputs = ""
    async for request_output in results_generator:
        prompt = request_output.prompt
        if request_output.finished:
            print()
        else:
            out = request_output.outputs[-1].text
            if len(out) == 0:
                continue
            out_delta = out[len(outputs) :]
            print(out_delta, end="", flush=True)
            outputs = out
DarkLight1337 commented 1 month ago

Please report this issue to #5901 so we can better investigate this.