llama 2 based model does not stop generating answers during inference

Karoljv commented 3 weeks ago

I have a problem that after finetunning when doing inference. The model does not stop generating another answers even if it already answered the question. The model is based on llama 2. Looks like the model have problems with eos token somehow.

Here is my tokenizer:

LlamaTokenizerFast(name_or_path='OPI-PG/Qra-7b', vocab_size=32000, model_max_length=4096, is_fast=True, padding_side='left', truncation_side='right', special_tokens={'bos_token': '~~', 'eos_token': '~~', 'unk_token': '', 'pad_token': ''}, clean_up_tokenization_spaces=False), added_tokens_decoder={ 0: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 1: AddedToken("~~", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 2: AddedToken("~~", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), }

I have set padding to 'right' and set tokenizer.pad_token = tokenizer.eos_token.

My formatting func looks like this: def create_conversation(sample) -> dict: strip_characters = "\"'" return { "messages": [ {"role": "system", "content": system_message}, {"role": "user", "content": f"{sample['instruction'].strip(strip_characters)} " f"{sample['input'].strip(strip_characters)}"}, {"role": "assistant", "content": f"{sample['output'].strip(strip_characters)}"} ] }

Here is my tokenizer.chat_template(without setting that manually I have got an error) tokenizer.chat_template = "{% if messages[0]['role'] == 'system' %}{% set loop_messages = messages[1:] %}{% set system_message = messages[0]['content'] %}{% elif false == true and not '<<SYS>>' in messages[0]['content'] %}{% set loop_messages = messages %}{% set system_message = '' %}{% else %}{% set loop_messages = messages %}{% set system_message = false %}{% endif %}{% for message in loop_messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if loop.index0 == 0 and system_message != false %}{% set content = '<<SYS>>\n' + system_message + '\n<</SYS>>\n\n' + message['content'] %}{% else %}{% set content = message['content'] %}{% endif %}{% if message['role'] == 'user' %}{{ bos_token + '[INST] ' + content.strip() + ' [/INST]' }}{% elif message['role'] == 'system' %}{{ '<<SYS>>\n' + content.strip() + '\n<</SYS>>\n\n' }}{% elif message['role'] == 'assistant' %}{{ ' ' + content.strip() + ' ' + eos_token }}{% endif %}{% endfor %}"

The output of generation looks like this: Ceremonia otwarcia Letnich Igrzysk Olimpijskich 2024 w Paryżu była kontrowersyjna ze względu na odtworzenie obrazu Leonarda da Vinci Ostatnia Wieczerza przez drag queens. \n\nCeremonia otwarcia Letnich Igrzysk Olimpijskich 2024 w Paryżu była kontrowersyjna ze względu na odtworzenie obrazu Leonarda da Vinci Ostatnia Wieczerza przez drag queens. \n\nCeremonia otwarcia Letnich Igrzysk Olimpijskich 2024 w Paryżu była kontrowersyjna ze względu na odtworzenie obrazu Leonarda da Vinci Ostatnia Wieczerza przez drag queens. \n\nCeremonia otwarcia Letnich Igrzysk Olimpijskich 2024 w Paryżu była kontrowersyjna ze względu na odtworzenie obrazu Leonarda da Vinci Ostatnia Wieczerza przez drag queens. \n\nCeremonia ot

It keeps repeating the same answer. Why is that?

danielhanchen commented 2 weeks ago

Wait did you set the pad_token == eos_token during finetuning?

Karoljv commented 2 weeks ago

I did since unloth set pad_token to something like this: OPI-PG/Qra-7b does not have a padding token! Will use pad_token "unk". I don't know which pad_token is it refering to but this model has pad_token https://huggingface.co/OPI-PG/Qra-7b/blob/main/tokenizer_config.json

Karoljv commented 2 weeks ago

ok I let the finetunning go with this "unk" pad token and I don't have problems with endless generating now. Also I let unsloth fix tokenizer by setting fix_tokenizer = True. I have read one forum when they mentioned that if pad token and eos token is the same. the model tends to not learn eos token properly so it results in endless result

danielhanchen commented 2 weeks ago

Oh wait I thought we auto set the pad_token :) Did you manually set it?

unslothai / unsloth

llama 2 based model does not stop generating answers during inference #1008