unslothai / unsloth

Finetune Llama 3.2, Mistral, Phi, Qwen & Gemma LLMs 2-5x faster with 80% less memory
https://unsloth.ai
Apache License 2.0
18k stars 1.25k forks source link

Add support for Llama 3 #350

Closed rwl4 closed 2 weeks ago

rwl4 commented 6 months ago

It looks like the tokenizer patching breaks. Here's the log:

ValueError                                Traceback (most recent call last)
Cell In[1], line 20
      7 # 4bit pre quantized models we support for 4x faster downloading + no OOMs.
      8 fourbit_models = [
      9     "unsloth/mistral-7b-bnb-4bit",
     10     "unsloth/mistral-7b-v0.2-bnb-4bit", # New Mistral 32K base model
   (...)
     17     "unsloth/gemma-2b-bnb-4bit",
     18 ] # More models at https://huggingface.co/unsloth
---> 20 model, tokenizer = FastLanguageModel.from_pretrained(
     21     model_name = "/srv/models/Meta-Llama-3-8B", # Choose ANY! eg teknium/OpenHermes-2.5-Mistral-7B
     22     max_seq_length = max_seq_length,
     23     dtype = dtype,
     24     load_in_4bit = load_in_4bit,
     25     # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
     26 )

File ~/.local/lib/python3.10/site-packages/unsloth/models/loader.py:138, in FastLanguageModel.from_pretrained(model_name, max_seq_length, dtype, load_in_4bit, token, device_map, rope_scaling, fix_tokenizer, trust_remote_code, use_gradient_checkpointing, *args, **kwargs)
    135     tokenizer_name = None
    136 pass
--> 138 model, tokenizer = dispatch_model.from_pretrained(
    139     model_name     = model_name,
    140     max_seq_length = max_seq_length,
    141     dtype          = dtype,
    142     load_in_4bit   = load_in_4bit,
    143     token          = token,
    144     device_map     = device_map,
    145     rope_scaling   = rope_scaling,
    146     fix_tokenizer  = fix_tokenizer,
    147     model_patcher  = dispatch_model,
    148     tokenizer_name = tokenizer_name,
    149     trust_remote_code = trust_remote_code,
    150     *args, **kwargs,
    151 )
    153 # In case the model supports tagging, add the unsloth tag.
    154 if hasattr(model, "add_model_tags"):

File ~/.local/lib/python3.10/site-packages/unsloth/models/llama.py:1121, in FastLlamaModel.from_pretrained(model_name, max_seq_length, dtype, load_in_4bit, token, device_map, rope_scaling, fix_tokenizer, model_patcher, tokenizer_name, trust_remote_code, **kwargs)
   1112 tokenizer_name = model_name if tokenizer_name is None else tokenizer_name
   1113 tokenizer = load_correct_tokenizer(
   1114     tokenizer_name    = tokenizer_name,
   1115     model_max_length  = max_position_embeddings,
   (...)
   1118     trust_remote_code = trust_remote_code,
   1119 )
-> 1121 model, tokenizer = patch_tokenizer(model, tokenizer)
   1122 model = model_patcher.post_patch(model)
   1124 # Patch up QKV / O and MLP

File ~/.local/lib/python3.10/site-packages/unsloth/models/_utils.py:152, in patch_tokenizer(model, tokenizer)
    149 if not hasattr(tokenizer, "pad_token") or tokenizer.pad_token is None:
    150     # Fixes https://github.com/unslothai/unsloth/issues/5
    151     if hasattr(tokenizer, "unk_token"):
--> 152         tokenizer.add_special_tokens({"pad_token" : tokenizer.unk_token})
    153         tokenizer.pad_token = tokenizer.unk_token
    154     else:

File ~/.local/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:973, in SpecialTokensMixin.add_special_tokens(self, special_tokens_dict, replace_additional_special_tokens)
    971 else:
    972     if not isinstance(value, (str, AddedToken)):
--> 973         raise ValueError(f"Token {value} for key {key} should be a str or an AddedToken instance")
    974     if isinstance(value, (str)):
    975         # for legacy purpose we default to stripping. `False` depends on this
    976         value = AddedToken(value, rstrip=False, lstrip=False, normalized=False, special=True)

ValueError: Token None for key pad_token should be a str or an AddedToken instance
DrewThomasson commented 6 months ago

Agreed I want llama 3 support as well

danielhanchen commented 6 months ago

yes yes working on it!

danielhanchen commented 6 months ago

FIXED!!

danielhanchen commented 6 months ago

Colab notebook: https://colab.research.google.com/drive/135ced7oHytdxu3N2DNe1Z0kqjyYIkDXp?usp=sharing

BedirT commented 6 months ago

I think you should update the readme asap for this :) it will be a good adv. @danielhanchen

DrewThomasson commented 6 months ago

Colab notebook: https://colab.research.google.com/drive/135ced7oHytdxu3N2DNe1Z0kqjyYIkDXp?usp=sharing

OMG THANK YOU SO MUCH! Already fine tuning my own models with this colab

rwl4 commented 6 months ago

Looking good, except the chat templating isn't quite right due to the tokenizer change.

FileNotFoundError                         Traceback (most recent call last)
Cell In[5], line 3
      1 from unsloth.chat_templates import get_chat_template
----> 3 tokenizer = get_chat_template(
      4     tokenizer,
      5     chat_template = "chatml",  unsloth
      6     mapping = {"role" : "from", "content" : "value", "user" : "human", "assistant" : "gpt"},
      7     map_eos_token = True,
      8 )
     10 def formatting_prompts_func(examples):
     11     convos = examples["conversations"]

File ~/.local/lib/python3.10/site-packages/unsloth/chat_templates.py:379, in get_chat_template(tokenizer, chat_template, mapping, map_eos_token)
    377         # Must fix the sentence piece tokenizer since there's no tokenizer.model file!
    378         token_mapping = { old_eos_token : stop_word, }
--> 379         tokenizer = fix_sentencepiece_tokenizer(tokenizer, new_tokenizer, token_mapping,)
    380     pass
    382 else:

File ~/.local/lib/python3.10/site-packages/unsloth/tokenizer_utils.py:222, in fix_sentencepiece_tokenizer(old_tokenizer, new_tokenizer, token_mapping, temporary_location)
    219 old_tokenizer.save_pretrained(temporary_location)
    221 tokenizer_file = sentencepiece_model_pb2.ModelProto()
--> 222 tokenizer_file.ParseFromString(open(f"{temporary_location}/tokenizer.model", "rb").read())
    224 # Now save the new tokenizer
    225 new_tokenizer.save_pretrained(temporary_location)

FileNotFoundError: [Errno 2] No such file or directory: '_unsloth_sentencepiece_temp/tokenizer.model'
Sneakr commented 6 months ago

@danielhanchen

It's wierd I have this issue, both in unsloth and in LLaMA-Factory, same exact error, and only for the LLAMA3 models.

==((====))==  Unsloth: Fast Llama patching release 2024.4
   \\   /|    GPU: NVIDIA GeForce RTX 4090. Max memory: 23.988 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.1.2+cu121. CUDA = 8.9. CUDA Toolkit = 12.1.
\        /    Bfloat16 = TRUE. Xformers = 0.0.25.post1. FA = False.
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Traceback (most recent call last):
  File "/home/workspace/unsl.py", line 53, in <module>
    model, tokenizer = FastLanguageModel.from_pretrained(
  File "/home/miniconda3/envs/unsloth_env/lib/python3.10/site-packages/unsloth/models/loader.py", line 132, in from_pretrained
    model, tokenizer = dispatch_model.from_pretrained(
  File "/home/miniconda3/envs/unsloth_env/lib/python3.10/site-packages/unsloth/models/llama.py", line 1085, in from_pretrained
    tokenizer = load_correct_tokenizer(
  File "/home/miniconda3/envs/unsloth_env/lib/python3.10/site-packages/unsloth/tokenizer_utils.py", line 262, in load_correct_tokenizer
    fast_tokenizer.add_bos_token = slow_tokenizer.add_bos_token
AttributeError: 'PreTrainedTokenizerFast' object has no attribute 'add_bos_token'. Did you mean: '_bos_token

Edit: A complete reinstall solved it.

danielhanchen commented 6 months ago

@rwl4 Working on chat template issues! Yep @Sneakr A complete reinstall would work - sorry on the issues

arunpatala commented 6 months ago

i have a doubt regarding llama3 finetuning. There are two versions of llama3 released: base and instruction finetuned. Is the current llama3 model (unsloth/llama-3-8b-bnb-4bit) the base model or instruction tuned? if its base model, will the instruction tuned model also be added?

danielhanchen commented 6 months ago

@arunpatala Base model. The Instruct is unsloth/llama-3-8b-Instruct-bnb-4bit. No the base model is purely a pretrained model with no instruction finetuning

arunpatala commented 6 months ago

Thanks for the information.

I am able to lora finetune with the instuct model now.

OxxoCodes commented 5 months ago

Noticing that non-quantized versions of Llama-3-70B don't seem to be available on Unsloth?

For example, here is non-quantized vs 4bit quantized Llama-3-8B:

On the other hand, only the 4bit 70B model appears to be available:

Very new to Unsloth, so I may very well be missing something here!

danielhanchen commented 5 months ago

Sadly the non quantized versions are near impossible to finetune anyways with 16bit on a single GPU, so it's not uploaded