unslothai / unsloth

Finetune Llama 3.2, Mistral, Phi, Qwen 2.5 & Gemma LLMs 2-5x faster with 80% less memory
https://unsloth.ai
Apache License 2.0
18.23k stars 1.27k forks source link

Qwen-2.5 Coder-7B-Instruct: ValueError: Unsloth: Untrained tokens found, but embed_tokens & lm_head not trainable, causing NaNs. Restart then add `embed_tokens` & `lm_head` to `FastLanguageModel.get_peft_model(target_modules = [..., "embed_tokens", "lm_head",]) #1053

Open dante3112 opened 1 month ago

dante3112 commented 1 month ago

I am trying to finetune Qwen-2.5 Coder-7B-Instruct on my custom dataset but am getting the following error:

ValueError: Unsloth: Untrained tokens of [[]] found, but embed_tokens & lm_head not trainable, causing NaNs. Restart then add `embed_tokens` & `lm_head` to `FastLanguageModel.get_peft_model(target_modules = [..., "embed_tokens", "lm_head",]). `Are you using the `base` model? Instead, use the `instruct` version to silence this warning.

I am getting this error with Qwen-2.5 Coder-7B (base) & Qwen-2.5 Coder-7B-Instruct model while Mistral-Nemo-Instruct-2407-bnb-4bit is working fine, I have updated the unsloth library as well.

Any workarounds for this? and why is this occurring?

my parameters:

max_seq_length = 16000 
dtype = None 
load_in_4bit = True

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Qwen2.5-Coder-7B-Instruct",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

model = FastLanguageModel.get_peft_model(
    model,
    r = 16,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, 
    bias = "none",    
    use_gradient_checkpointing = "unsloth",
    random_state = 3407,
    use_rslora = False,  
    loftq_config = None,
)
milsun commented 1 month ago

were you able to solve it? i m getting same issue with a base model.

dante3112 commented 1 month ago

@milsun nope, I tried the unsloth/Qwen2.5-Coder-7B-Instruct-bnb-4bit model as well but still getting the same error

paraschopra commented 1 month ago

I am getting similar error too (using unsloth/Meta-Llama-3.1-8B-Instruct)

dante3112 commented 1 month ago

@danielhanchen can you please help us out with this

WasamiKirua commented 1 month ago

I had the same on 14B and the only way I have found is to add "embed_tokens" and "lm_head" to the target modules like

so: target_modules = ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj", "embed_tokens", "lm_head",]

of course this will require more time and more vram but at least you will be able to run the trainer

danielhanchen commented 1 month ago

Extreme apologies on the delay everyone - sorry!

@dante3112 @WasamiKirua I managed to fix the Instruct model issue - please update Unsloth via

pip uninstall unsloth -y
pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"

@milsun I think the base model should be fine now @paraschopra Most likely some sort of chat template issue - ie some random tokens in the chat template are untrained

one-and-only commented 1 month ago

@danielhanchen This didn't fix the issue. I'm trying to train Mistral-12B-NeMo-Instruct.

one-and-only commented 1 month ago

It worked when I trained my first LoRA, but is now failing on my second dataset.

dante3112 commented 1 month ago

@danielhanchen still getting the same error for "unsloth/Qwen2.5-Coder-7B-Instruct-bnb-4bit" but training has started for"unsloth/Qwen2.5-Coder-7B-Instruct"

danielhanchen commented 1 month ago

:( Ok will re-investigate - sorry on the issue

wrisigo commented 1 week ago

Still running into this issue

unsloth 2024.10.7 unsloth_zoo 2024.11.0

neph1 commented 1 day ago

Yes, it seems to be a chat template issue. I managed to get training working by removing tools calls.

This is the one I use:

"chat_template": "{%- if messages[0]['role'] == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0]['content'] + '<|im_end|>\\n' }}\n{%- else %}\n {{- '<|im_start|>system\\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\\n' }}\n{%- endif %}\n{%- for message in messages %}\n {%- if message.role == \"user\" or (message.role == \"system\" and not loop.first) or (message.role == \"assistant\" and not message.tool_calls) %}\n {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {{- '<|im_start|>' + message.role }}\n {%- if message.content %}\n {{- '\\n' + message.content }}\n {%- endif %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if loop.index0 == 0 or messages[loop.index0 - 1].role != \"tool\" %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' + message.content + '\\n</tool_response>' }}\n {%- if loop.last or messages[loop.index0 + 1].role != \"tool\" %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n{%- endif %}\n",

Edit: When I say "working", I mean "running". No verification of results, yet. I'll probably do a test run later today.

Edit2: Results and settings can be found here: https://huggingface.co/neph1/Qwen2.5-Coder-7B-Instruct-Unity