openai / human-eval

Code for the paper "Evaluating Large Language Models Trained on Code"
MIT License
2.43k stars 349 forks source link

Why do I use the phi model to output the same result for all samples at a temperature of 0.8? #34

Open Mrzhang-dada opened 11 months ago

Mrzhang-dada commented 11 months ago

def generate_one_completion(prompt: str): torch.set_default_device("cuda") model = AutoModelForCausalLM.from_pretrained("//phi-1", trust_remote_code=True) tokenizer = AutoTokenizer.from_pretrained("//phi-1", trust_remote_code=True)

inputs = tokenizer("'''"+prompt+"'''", return_tensors="pt", return_attention_mask=False)

inputs = tokenizer(prompt, return_tensors="pt", return_attention_mask=False)
# outputs = model.generate(**inputs, max_length=200,max_new_tokens=430)
outputs = model.generate(**inputs, max_length=200,temperature=0.8,do_sample=True)
completion = tokenizer.batch_decode(outputs)[0]
return completion

This is my model output code. Regardless of the value of num_samples_per_task set, it returns the same answer for each question