unslothai / unsloth

Finetune Llama 3.2, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory
https://unsloth.ai
Apache License 2.0
17.63k stars 1.22k forks source link

The attention mask and the pad token id were not set. #537

Open muhammadumair894 opened 5 months ago

muhammadumair894 commented 5 months ago

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results. Setting pad_token_id to eos_token_id:2 for open-end generation. Unsloth: Not a fast tokenizer, so can't process it as of yet :( Please log a Github issue if you want this as a new feature! Your chat template will still work, but it won't add or edit tokens.

`from unsloth.chat_templates import get_chat_template

tokenizer = get_chat_template( tokenizer, chat_template = "mistral", # Supports zephyr, chatml, mistral, llama-3, alpaca, vicuna, vicuna_old, unsloth mapping = {"role" : "from", "content" : "value", "user" : "human", "assistant" : "gpt"}, # ShareGPT style map_eos_token = True, # Maps <|im_end|> to instead )

FastLanguageModel.for_inference(model) # Enable native 2x faster inference

messages = [ {"from": "human", "value": "I am on social security disability income. I own a house that has equity. It's homesteaded and we've been here for 20 years. I have little consumer debt, except a few charged off accounts in dispute"}, ] inputs = tokenizer.apply_chat_template( messages, tokenize = True, add_generation_prompt = True, # Must add for generation return_tensors = "pt", ).to("cuda")

outputs = model.generate(input_ids = inputs, max_new_tokens = 1024, use_cache = True) tokenizer.batch_decode(outputs)`

danielhanchen commented 5 months ago

Is this a Mistral model?

hannesfant commented 2 weeks ago

Is this a Mistral model?

We're seeing this same problem, with Mistral NeMo. Is it anything you need to worry about, and is it possible to correct after fine tuning or does it invalidate our current model?

danielhanchen commented 1 week ago

@hannesfant Oh is the output correct? They're just warnings