unslothai / unsloth

Finetune Llama 3.1, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory
https://unsloth.ai
Apache License 2.0
14.83k stars 987 forks source link

Train Llama3 8B error: AttributeError: 'PreTrainedTokenizerFast' object has no attribute 'add_bos_token'. #354

Open ch-tseng opened 4 months ago

ch-tseng commented 4 months ago

I got the error while tried to train Llama3 8B , I used the demo script from Alpaca + Llama-3 8b full example.ipynb

==((====))== Unsloth: Fast Llama patching release 2024.4 \ /| GPU: NVIDIA GeForce RTX 3090. Max memory: 23.691 GB. Platform = Linux. O^O/ _/ \ Pytorch: 2.2.2+cu121. CUDA = 8.6. CUDA Toolkit = 12.1. \ / Bfloat16 = TRUE. Xformers = 0.0.25.post1. FA = True. "-____-" Free Apache license: http://github.com/unslothai/unsloth Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Traceback (most recent call last): File "/GPUData/working/unsloth/train_llama3_8b.py", line 23, in model, tokenizer = FastLanguageModel.from_pretrained( File "/home/chtseng/envs/LM/lib/python3.10/site-packages/unsloth/models/loader.py", line 132, in from_pretrained model, tokenizer = dispatch_model.from_pretrained( File "/home/chtseng/envs/LM/lib/python3.10/site-packages/unsloth/models/llama.py", line 1107, in from_pretrained tokenizer = load_correct_tokenizer( File "/home/chtseng/envs/LM/lib/python3.10/site-packages/unsloth/tokenizer_utils.py", line 289, in load_correct_tokenizer fast_tokenizer.add_bos_token = slow_tokenizer.add_bos_token AttributeError: 'PreTrainedTokenizerFast' object has no attribute 'add_bos_token'. Did you mean: '_bos_token'?

iskenderulgen commented 4 months ago

I faced the same issue, not sure if its huggingface transformers library related or unsloth.

basel_model_id = "Meta-Llama-3-8B-Instruct"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

tokenizer = AutoTokenizer.from_pretrained(base_model_id)
model = AutoModelForCausalLM.from_pretrained(base_model_id, quantization_config=bnb_config, device_map="auto")

for transformers 4.40 library using AutoModelForCausalLM solves the issue but its more benefical to use unsloth for Lora fine tuning.

danielhanchen commented 4 months ago

@iskenderulgen @ch-tseng Sorry on the issue and the delay! You need to uninstall Unsloth then reinstall it

pip uninstall unsloth -y
pip install --upgrade --force-reinstall --no-cache-dir git+https://github.com/unslothai/unsloth.git
ashleylew commented 4 months ago

uninstalling and reinstalling unfortunately didn't work for me, I'm still getting the same error! Any further suggestions? Here's the script I'm trying to use, if that's helpful:

import torch
from transformers import PreTrainedTokenizerFast
from unsloth import FastLanguageModel
import json

def load_model_and_tokenizer(model_path):
    model = FastLanguageModel.from_pretrained(model_path)[0]  # Assuming the model is the first element in the tuple
    FastLanguageModel.for_inference(model)  # Correctly call the for_inference method
    tokenizer = PreTrainedTokenizerFast.from_pretrained(model_path)

    return model, tokenizer

def generate_text(model, tokenizer, instruction, conversation_history, input_text):
    prompt = f"### Instruction:\n{instruction}\n### Conversation History:\n{conversation_history}\n### Input:\n{input_text}\n### Response:\n"
    inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=True).to("cuda")
    input_ids = inputs["input_ids"]  # Explicitly use only input_ids

    # Generate text without passing extra kwargs
    outputs = model.generate(input_ids, max_new_tokens=64, use_cache=True)
    return tokenizer.batch_decode(outputs, skip_special_tokens=True)

if __name__ == "__main__":
    # model_path = 'Ash/gemma/regular_english/checkpoint-3170'
    # model_path = 'unsloth/gemma-7b-bnb-4bit'
    # model_path = 'unsloth/llama-2-7b-bnb-4bit'
    model_path = 'unsloth/llama-3-8b-bnb-4bit'

    model, tokenizer = load_model_and_tokenizer(model_path)

    #### Example use
    # instruction = "Write the guide's next turn based on the information in the following document.  /n/ /n/ DOCUMENT: /n/ /n/PLANETARIUM /n/ The COSI Planetarium\u2014the largest in Ohio\u2014features state-of-the-art digital technology that offers an unsurpassed glimpse of our incredible universe. The COSI Planetarium\u2019s Digistar 7 projection system and 60-foot dome will transport you to the farthest reaches of the galaxy, to undersea volcanoes and distant lands, and even into the human body. For all who wonder, who question, who dream, your window to the universe is now open at COSI."
    # input_text = "User: I am interested in the Planetarium exhibit. Can you please tell me more about it?"

    # output = generate_text(model, tokenizer, instruction, input_text)
    # print("Generated Sequence:", output)

    #### Run dev set
    dataset_path = '/english_version/llama_english_dev_set_ENGLISH_DOCS.json'

    with open(dataset_path, 'r') as file:
        dataset = json.load(file)

    # Initialize a list to store results
    results = []

    for entry in dataset:
        instruction = entry['instruction']
        input_text = entry['input']
        conversation_history_raw = entry['history']
        conversation_history = ''
        for pair in conversation_history_raw:
            if pair[0] == '':
                conversation_history = conversation_history + pair[1] + '\n'
            else:
                conversation_history = conversation_history + pair[0] + '\n' + pair[1] + '\n'
        conversation_history = conversation_history[:-1]
        output = generate_text(model, tokenizer, instruction, conversation_history, input_text)
        results.append(output)
        print("Generated Sequence:", output)
    with open('/Unsloth/unsloth/unsloth/output_files/generated_predictions_LLAMA3_PRETRAIN.json', 'w', encoding='utf-8') as f:
        json.dump(results, f, ensure_ascii=False)
danielhanchen commented 4 months ago

@ch-tseng Oh no thats not good :( Did you manage to restart the Python terminal itself? (I'm assuming yes?) That's very weird it didn't work

ashleylew commented 4 months ago

Yeah I did! I tried doing a complete reinstall as well in a new conda environment and that didn't work either. I'm wondering if the way I adapted the code has something to do with it? But I can't really imagine what the issue is. I'll keep trying and let you know if I get it to work.

iskenderulgen commented 4 months ago

hi @ashleylew after i reinstalled the unsloth i haven't faced with any problem. Can you load the model with the given example:

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "",
    max_seq_length = (int),
    dtype = torch.bfloat16,
    load_in_4bit= True,
    device_map = 'auto',
)

as i can see in your example you load the model and try to extract model with indexing

model = FastLanguageModel.from_pretrained(model_path)[0]

you can simply load model and tokenizer together and modify from there.

ashleylew commented 4 months ago

I tried that and no luck, unfortunately!! Thanks so much for the suggestion though.

Mohamed-E-Fayed commented 3 months ago

tokenizer.add_bos_token is a boolean value in configurations of some pre-trained models. You may either add it or ignore it. It may be important in some custom logic, and not in huggingface transformers library itself, as you can see from the trace.

You may add it to tokenizer_config.json. Check that attribute in facebook/opt-6.7b model for instance.

Hope that helps.