[BUG] Size Mismatch When Merging LoRA Model To Base Model

wangzizhe commented 1 month ago

Prerequisites

[X] I have read the documentation.
[X] I have checked other issues for similar problems.

Backend

Hugging Face Space/Endpoints

Interface Used

UI

Error Logs

RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:
size mismatch for base_model.model.model.embed_tokens.weight: 
copying a param with shape torch.Size([151665, 3584]) from checkpoint, 
the shape in current model is torch.Size([152064, 3584]).

When i'm trying to merge my fine-tuned LoRA adapter (https://huggingface.co/neighborwang/ModeliCo-7B) into the base model Qwen2.5-Coder-7B-Instruct.

I get the error of size mismatch like many useres are facing here with base models like Qwen or Llama: https://github.com/huggingface/autotrain-advanced/issues/487. There is still no solution and I don't get why this issue has been closed.

I faced the same issue with Llama 3.1 but i solved it use specific transformers version, so I tried for my adapter and Qwen2.5-Coder-7B-Instruct the following transformers versions:

v4.45.1 v4.45.0 v4.44.0 v4.43.0 v4.37.0

But nothing works... I need some help.

I don't know why the fine-tuned adapter has different size as the base model, I suppose they should be the same automatically through the process with AutoTrain.

Is this a bug or is this something which I did wrong?

Thanks a lot in advance.

abhishekkrthakur commented 1 month ago

does this error occur when merge_adapter is set to true in AutoTrain or are you getting this error when merging manually after the training?

wangzizhe commented 1 month ago

Thanks for your quick reply!

This happens after the fine-tuning process, while I'm trying to manually merge them.

Regarding the merge process, I found the merge adapter here, somehow it is outdated and buggy. I modified it (code below) and successfully merged my another adapter and LLama base model.

But when i use it merge Qwen2.5 model, always the same error, as shown in my description.

import gradio as gr
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

def merge(base_model, trained_adapter, token):
    # Load base model
    base = AutoModelForCausalLM.from_pretrained(
        base_model, torch_dtype=torch.float16, low_cpu_mem_usage=True, token=token
    )

    # Load adapter
    model = PeftModel.from_pretrained(base, trained_adapter, token=token)
    try:
        tokenizer = AutoTokenizer.from_pretrained(base_model, token=token)
    except RecursionError:
        tokenizer = AutoTokenizer.from_pretrained(
            base_model, unk_token="<unk>", token=token
        )

    # Merge and unload the adapter
    model = model.merge_and_unload()

    print("Saving target model")
    model.push_to_hub(trained_adapter, token=token)
    tokenizer.push_to_hub(trained_adapter, token=token)

    return gr.Markdown("Model successfully merged and pushed! Please shutdown/pause this space")

with gr.Blocks() as demo:
    gr.Markdown("## AutoTrain Merge Adapter")
    gr.Markdown("Please duplicate this space and attach a GPU in order to use it.")

    token = gr.Textbox(
        label="Hugging Face Write Token", value="", lines=1, max_lines=1, interactive=True, type="password"
    )
    base_model = gr.Textbox(
        label="Base Model (e.g. meta-llama/Llama-2-7b-chat-hf)", value="", lines=1, max_lines=1, interactive=True
    )
    trained_adapter = gr.Textbox(
        label="Trained Adapter Model (e.g. username/autotrain-my-llama)", value="", lines=1, max_lines=1, interactive=True
    )

    submit = gr.Button(value="Merge & Push")
    op = gr.Markdown()

    submit.click(merge, inputs=[base_model, trained_adapter, token], outputs=[op])

if __name__ == "__main__":
    demo.launch()

abhishekkrthakur commented 1 month ago

https://github.com/huggingface/autotrain-advanced/blob/943712619d70b686845d476e8b832bb9651ca97a/src/autotrain/tools/merge_adapter.py#L9

does this function also give you error?

you can use autotrain tools to merge which uses the code above:

could you try and let me know if even this gives error?

wangzizhe commented 1 month ago

Thank you very much!

I'm using this space https://huggingface.co/spaces/autotrain-projects/autotrain-advanced with no code training, since I don't have GPU on the local machine, also CPU is very bad.

Is there any possibity to use this merging tool within this space?

abhishekkrthakur commented 1 month ago

i took a look at it and it seems to me that adapter is merging fine. instead of merging later, could you please use the merge_adapter parameter and set it to true before training?

wangzizhe commented 1 month ago

Hi Abhishek, thank you very much. Ok, in this case I would set it to true before training.

I close this issue for now, if I have the same issue still, i will reopen it!

Thanks!

abhishekkrthakur commented 1 month ago

could you please confirm if it worked for you now? :)

wangzizhe commented 1 month ago

Hi Abhishek, I haven't tested whether it works for Qwen2.5-Coder-7B. I have just trained a model based on StarCoder2-15B and used 'merge_adapter=true'. It worked without any problem. I think I will fine-tune the Qwen2.5-Coder-7B again this or next week using AutoTrain. I will update and comfirm if it works.

wangzizhe commented 1 month ago

@abhishekkrthakur Just fine-tuned a Qwen2.5-Coder-7B using AutoTrain with merge_adapter=true. It worked without any problems. Thanks!

abhishekkrthakur commented 1 month ago

great. thank you for confirming.

huggingface / autotrain-advanced