[Misc]: How to force generate a fixed response from llama3

Anything you want to discuss about vllm.

I found the llama generate different response for a static input:

Running multiple times
Batch it with other input

This happen even when I set temp=1 top_k=1 and random seed.

The generated test usually same at first few tokens, but after them they will be difference.

Anyone knows how to force generate a fixed response?

import torch
from vllm import LLM, SamplingParams

torch.random.manual_seed(999)

llm = LLM(model='/home/Meta-Llama-3-8B-Instruct')
prompts = [
    "Hi my name is",
    "The capital of France is"
]

# generate multiple time
texts = []
for i in range(10):
    sampling_params = SamplingParams(temperature=1, top_k=1, max_tokens=100, top_p=1)
    outputs = llm.generate(prompts, sampling_params)
    for output in outputs:
        prompt = output.prompt
        generated_text = output.outputs[0].text
        texts.append(generated_text)
for text in texts:
    print(text)

# generate with different batch
texts = []
for i in range(5):
    prompts.append(prompts[0])
    prompts.append(prompts[1])

    sampling_params = SamplingParams(temperature=1, top_k=1, max_tokens=100)
    outputs = llm.generate(prompts, sampling_params)
    for output in outputs:
        prompt = output.prompt
        generated_text = output.outputs[0].text
        texts.append(generated_text)

for text in texts:
    print(text)

vllm-project / vllm

[Misc]: How to force generate a fixed response from llama3 #7770

Anything you want to discuss about vllm.