[Usage]: Wait for the response for each prediction

How would you like to use vllm

I would like to wait for each response using vllm because I will use the previous predictions to complement the next ones. However, I don't know how to do this with vllm.

Code

import pandas as pd
from vllm import LLM, SamplingParams
from vllm.lora.request import LoRARequest

llm = LLM(
    model="maritaca-ai/sabia-7b",
    enable_lora=True, 
    max_model_len=256,
    gpu_memory_utilization=0.95,
    enforce_eager=True,
)

sampling_params = SamplingParams(
    temperature=0.001,
    max_tokens=256
)

df = pd.read_csv("prompts_bluche_test.csv")

prev_5_words = ''
next_5_words = ''
last_filename_prefix = ''

prompts = []

for index, row in df.iterrows():
    filename_prefix = row['filename'][:8]
    next_filename_prefix = df.iloc[index+1]['filename'][:8] if index < len(df)-1 else ''
    if (last_filename_prefix == '' or filename_prefix == next_filename_prefix):
        next_5_words = row['next_5_words']
    else:
        next_5_words = ''

    prompt = f"""
        ### Instrução: Corrija os erros pós-OCR presentes na linha.
        ### Palavras anteriores: {prev_5_words}
        ### Linha a corrigir: {row['input']}
        ### Palavras seguintes: {next_5_words}
        ### Resposta:
    """

    output = llm.generate(
        prompt,
        sampling_params,
        lora_request=LoRARequest("spelling", 1, "results/api_experiment_run/model/model_weights")
    )
    generated_text = output.outputs[0].text
    prev_5_words = " ".join(generated_text.split()[-5:])
    if (last_filename_prefix == '' or filename_prefix == last_filename_prefix):
        prev_5_words = " ".join(df.iloc[index-1]['output'].split()[-5:])
    else:
        prev_5_words = ''

    last_filename_prefix = filename_prefix
    prompts.append(generated_text)

predictions = pd.DataFrame(prompts, columns=['prediction'])

predictions.to_csv("predictions_sabia_bluche.csv", index=False)

vllm-project / vllm

[Usage]: Wait for the response for each prediction #7741

How would you like to use vllm

Code