Llama 2 Model generate same tokens over and over

u1vi commented 9 months ago

Hi folks,

I am trying "kaist-ai/llama2-langbridge-9b" model. But it kept generating meaningless tokens and I could not get what I wanted. Maybe it is because of the template that I am using. Do you have any suggestions or solutions?

Here is my implementation:

from transformers import AutoTokenizer
from langbridge import LangBridgeModel

enc_tokenizer = AutoTokenizer.from_pretrained('kaist-ai/langbridge_encoder_tokenizer') 
lm_tokenizer = AutoTokenizer.from_pretrained('kaist-ai/llama2-langbridge-9b')
model = LangBridgeModel.from_pretrained('kaist-ai/llama2-langbridge-9b').to('cuda')

temp = ("<s> [INST] <<SYS>>  \n{system_prompt}\n <</SYS>> {user_message} [/INST]")
ss = 'You are inteligent and helpfull agent.'

question = "How are you?"
prefix =  temp.format(system_prompt=ss, user_message = question)

print(prefix)
output = model.generate_from_prefix(enc_tokenizer, lm_tokenizer, prefix=prefix)
print(output[0])

Output that I am getting:

<s> [INST] <<SYS>>

I am a very helpful agent.
<</SYS>> How are you? [INST]

<s> [INST] <<SYS>>

I am a very helpful agent.
<</SYS>> How are you? [INST]

<s> [INST] <<SYS>>

I am a very helpful agent.
<</SYS>> How are you? [INST]

<s> [INST] <<SYS>>

I am a very helpful agent.
<</SYS>> How are you? [INST]

<s> [INST] <<SYS>>

This is the warning that I am getting (maybe that is somehow related)

 UserWarning: `num_beams` is set to 1. However, `early_stopping` is set to `True` -- this flag is only used in beam-based generation modes. You should set `num_beams>1` or unset `early_stopping`.
  warnings.warn(

u1vi commented 9 months ago

Update:

I deleted the template special tokens. Now it is generating something(sometimes meaningless sometimes okay). But it is still looping. Prompt : How are you? Response: I'm doing great. I'm doing great. I'm doing great. I'm doing great. I'm doing great. I'm doing great. I'm doing great. I'm doing great. I'm doing great. I'm doing great. I'm doing great. I'm doing great. I'm doing great. I'm doing great. I'm doing great. I'm doing great. I'm doing great. I'm doing great. I'm doing great. I'm doing great. I'm doing great. I'm doing great. I'm doing great. I'm doing great. I'm doing

Another problem is, for other languages, including Korean, it just generates next-line tokens that is it.

For Turkish and Russian it is hallucinating.

Giving response in Hun for Turkish. And generating very random stuff for Russian. Prompt: объясни мне слово друг Translation: explain the word friend to me Response:


—«Дорогая, я не знаю, как это сказать, но я люблю тебя.
—«Я знаю, как это сказать,»
—ответила она,
«я люблю тебя, как люблю мою мать.
«Я люблю тебя, как люблю мою мать, как люблю мою сестру, как люблю мою собаку, как люблю мою соседку, как люблю мою соседку собаку, как люблю мою сосе

I was expecting to get the response in English but nevermind. Translation:

Yu.

— “Darling, I don’t know how to say this, but I love you.
—“I know how to say it,”
- she answered,
“I love you like I love my mother.
“I love you as I love my mother, as I love my sister, as I love my dog, as I love my neighbor, as I love my neighbor’s dog, as I love my neighbor.”

MattYoon commented 9 months ago

Hi @u1vi , thanks for reporting this issue.

The problems you've described are expected because llama2-langbridge-9b model is based on the non-instruction tuned version of llama2, and most likely requires few-shot examples to generate what you want.

Maybe it is because of the template that I am using. Do you have any suggestions or solutions?

Since the model was aligned using unlabeled corpora, there is no template for the llama2-langbridge model.

MattYoon commented 9 months ago

I suggest you try out orca2-langbridge models if you want a model that works in zero-shot (instruction tuned).

These do have specific templates, which is the same with the original orca 2 models.

from transformers import AutoTokenizer
from langbridge import LangBridgeModel

# our pretrained langbridge models all leverage this encoder tokenizer
enc_tokenizer = AutoTokenizer.from_pretrained('kaist-ai/langbridge_encoder_tokenizer') 
lm_tokenizer = AutoTokenizer.from_pretrained('kaist-ai/orca2-langbridge-9b')
model = LangBridgeModel.from_pretrained('kaist-ai/orca2-langbridge-9b').to('cuda')

system_message = "You are an AI assistant. You will be given a task. You must generate a detailed and long answer."
user_message = "объясни мне слово друг"

prompt = f"<|im_start|>system\n{system_message}<|im_end|>\n<|im_start|>user\n{user_message}<|im_end|>\n<|im_start|>assistant"
prefix =  prompt.format(system_message=system_message, user_message=user_message)
output = model.generate_from_prefix(enc_tokenizer, lm_tokenizer, prefix=prefix)
print(output)

The word "friend" is a term used to describe a close, personal relationship between two or more individuals. It is derived from the Old English word "frēond," which means "friend" or "ally." The concept of friendship has been present in human societies throughout history, and it is often associated with mutual trust, support, and affection.\n\nFriendship can be categorized into different types, such as:\n\n1. Acquaintance: This is a superficial relationship where two people have a brief or occasional interaction, but they do not share a deep emotional connection.\n\n2. Casual friend: This type of friendship is characterized by a more relaxed and inform

u1vi commented 9 months ago

Thank you @MattYoon for your fast reply.

kaistAI / LangBridge

Llama 2 Model generate same tokens over and over #4