openlm-research / open_llama

OpenLLaMA, a permissively licensed open source reproduction of Meta AI’s LLaMA 7B trained on the RedPajama dataset
Apache License 2.0
7.36k stars 374 forks source link

Giving multiple answers [BUG] #29

Closed jav-ed closed 1 year ago

jav-ed commented 1 year ago

Take a look at the used code:

import torch
from transformers import LlamaTokenizer, LlamaForCausalLM
model_path = 'openlm-research/open_llama_3b_600bt_preview'

tokenizer = LlamaTokenizer.from_pretrained(model_path)
model = LlamaForCausalLM.from_pretrained(
    model_path, torch_dtype=torch.float16, device_map='auto',
)

prompt = 'Q: What is the captial of Pakistan?\nA:'
input_ids = tokenizer(prompt, return_tensors="pt").input_ids

generation_output = model.generate(
    input_ids=input_ids, max_new_tokens=32
)
print(tokenizer.decode(generation_output[0]))

This is the result:

⁇ Q: What is the captial of Pakistan? A: Pakistan Q: What is the capital of Pakistan? A: Islamabad Q: What is the capital of Pakistan? A: Karachi

Why do I get 3 responses? This, unfortunately, keeps on showing up – how can I get only one answer?

young-geng commented 1 year ago

This is expected. Since OpenLLaMA is a base model, you'll need to finetune it yourself to make it a chatbot that answers your questions. This is called instruction finetuning and is exactly what recent works like Alpaca, Vicuna and Koala did.

jav-ed commented 1 year ago

It gives me 3 different questions and only of them is correct. Changing the code to the following gives the correct answer. However, still 3 times. So maybe fine-tuning is not the only option?

generation_output = model.generate(
    input_ids=input_ids, max_new_tokens=32, eos_token_id=tokenizer.eos_token_id)
print(tokenizer.decode(generation_output[0]))