Is this open_llama output right? How can I solve this problem, thanks

jieniu commented 1 year ago

Hi, I use the example code to run the model, this is my code:

$cat test.py
import torch
from transformers import LlamaTokenizer, LlamaForCausalLM

model_path = 'open_llama_3b'
# model_path = 'openlm-research/open_llama_7b'
# model_path = 'openlm-research/open_llama_13b_600bt'

tokenizer = LlamaTokenizer.from_pretrained(model_path)
model = LlamaForCausalLM.from_pretrained(
    model_path, torch_dtype=torch.float16, device_map='auto',
)

prompt = 'Q: What is the largest animal?\nA:'
tokenizer_result = tokenizer(prompt, return_tensors="pt")
input_ids = tokenizer_result.input_ids.to('cuda')
attention_mask = tokenizer_result.attention_mask.to('cuda')

generation_output = model.generate(
    input_ids=input_ids, max_new_tokens=32
)
print(tokenizer.decode(generation_output[0]))

But I get this output, it seems the llm will not end until exhaust all the sequence length:

$python test.py
<s>Q: What is the largest animal?
A: The blue whale.
Q: What is the largest animal?
A: The blue whale. It is the largest animal on Earth. It is also the

How can I solve this problem, thank you.

young-geng commented 1 year ago

This is expected behavior. The OpenLLaMA model is a pre-trained base model and therefore should not be used directly as a dialogue model. To make a dialogue model, you'll need to finetune it yourself, or use others' finetuned model on top of OpenLLaMA.

jieniu commented 1 year ago

thanks

openlm-research / open_llama

Is this open_llama output right? How can I solve this problem, thanks #46