I found the llama generate different response for a static input:
Running multiple times
Batch it with other input
This happen even when I set temp=1 top_k=1 and random seed.
The generated test usually same at first few tokens, but after them they will be difference.
Anyone knows how to force generate a fixed response?
import torch
from vllm import LLM, SamplingParams
torch.random.manual_seed(999)
llm = LLM(model='/home/Meta-Llama-3-8B-Instruct')
prompts = [
"Hi my name is",
"The capital of France is"
]
# generate multiple time
texts = []
for i in range(10):
sampling_params = SamplingParams(temperature=1, top_k=1, max_tokens=100, top_p=1)
outputs = llm.generate(prompts, sampling_params)
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
texts.append(generated_text)
for text in texts:
print(text)
# generate with different batch
texts = []
for i in range(5):
prompts.append(prompts[0])
prompts.append(prompts[1])
sampling_params = SamplingParams(temperature=1, top_k=1, max_tokens=100)
outputs = llm.generate(prompts, sampling_params)
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
texts.append(generated_text)
for text in texts:
print(text)
Anything you want to discuss about vllm.
I found the llama generate different response for a static input:
This happen even when I set temp=1 top_k=1 and random seed.
The generated test usually same at first few tokens, but after them they will be difference.
Anyone knows how to force generate a fixed response?