huggingface / peft

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
https://huggingface.co/docs/peft
Apache License 2.0
16.57k stars 1.64k forks source link

the lack of adapter_model.bin and adapter_config.json after fine-tuning #2211

Closed TracyGuo2001 closed 2 weeks ago

TracyGuo2001 commented 2 weeks ago

System Info

peft 0.13.3.dev0

Who can help?

No response

Information

Tasks

Reproduction

after using model.save_pretrained(script_args.output_dir), i didnt get adapter_model.bin and adapter_config.json 1111111111113

then i check this /peft/mixed_model.py,save_pretrained() is as below without implementation. image

Expected behavior

i want the final adapter_model.bin and adapter_config.json after ft.

JINO-ROHIT commented 2 weeks ago

@TracyGuo2001 what is the peft method you are using? can you share a small snippet?

TracyGuo2001 commented 2 weeks ago

I use LoRA. Here is the code. Thank you very much.

peft_config = LoraConfig(
                task_type=TaskType.CAUSAL_LM,
                target_modules=script_args.target_modules.split(','),
                inference_mode=False,
                r=script_args.lora_rank,
                lora_alpha=script_args.lora_alpha,
                lora_dropout=script_args.lora_dropout,
                init_lora_weights=script_args.init_lora_weights,
            )
model = get_peft_model(model, peft_config)
......(train)
model.save_pretrained(script_args.output_dir)
JINO-ROHIT commented 2 weeks ago

can you try this ? @TracyGuo2001

peft_config = LoraConfig(
task_type=TaskType.CAUSAL_LM,
target_modules=script_args.target_modules.split(','),
inference_mode=False,
r=script_args.lora_rank,
lora_alpha=script_args.lora_alpha,
lora_dropout=script_args.lora_dropout,
init_lora_weights=script_args.init_lora_weights,
)
model = AutoModelForCausalLM.from_pretrained(model_id)
model.add_adapter(peft_config)
[train].........
githubnemo commented 2 weeks ago

@TracyGuo2001 are you sure that model is not overwritten between the call to get_peft_model and model.save_pretrained? It would be helpful to have the output of print(model) and possibly the code so that we can take a look.

TracyGuo2001 commented 2 weeks ago

can you try this ? @TracyGuo2001

peft_config = LoraConfig(
task_type=TaskType.CAUSAL_LM,
target_modules=script_args.target_modules.split(','),
inference_mode=False,
r=script_args.lora_rank,
lora_alpha=script_args.lora_alpha,
lora_dropout=script_args.lora_dropout,
init_lora_weights=script_args.init_lora_weights,
)
model = AutoModelForCausalLM.from_pretrained(model_id)
model.add_adapter(peft_config)
[train].........

I can get the model from pretrained, but got an ERROR when model.add_adapter(peft_config) .AttributeError: can't set attribute.How could this happen? It's so strange.

but after model = get_peft_model(model, peft_config), i can get a model with LoRA,this is the the output of print(model).

PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): LlamaForCausalLM(
      (model): LlamaModel(
        (embed_tokens): Embedding(32000, 4096, padding_idx=0)
        (layers): ModuleList(
          (0-31): 32 x LlamaDecoderLayer(
            (self_attn): LlamaAttention(
              (q_proj): lora.Linear(
                (base_layer): Linear(in_features=4096, out_features=4096, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Identity()
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=128, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=128, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (k_proj): lora.Linear(
                (base_layer): Linear(in_features=4096, out_features=4096, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Identity()
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=128, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=128, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (v_proj): lora.Linear(
                (base_layer): Linear(in_features=4096, out_features=4096, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Identity()
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=128, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=128, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (o_proj): lora.Linear(
                (base_layer): Linear(in_features=4096, out_features=4096, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Identity()
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=128, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=128, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (rotary_emb): LlamaRotaryEmbedding()
            )
            (mlp): LlamaMLP(
              (gate_proj): lora.Linear(
                (base_layer): Linear(in_features=4096, out_features=11008, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Identity()
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=128, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=128, out_features=11008, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (up_proj): lora.Linear(
                (base_layer): Linear(in_features=4096, out_features=11008, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Identity()
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=128, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=128, out_features=11008, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (down_proj): lora.Linear(
                (base_layer): Linear(in_features=11008, out_features=4096, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Identity()
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=11008, out_features=128, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=128, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (act_fn): SiLUActivation()
            )
            (input_layernorm): LlamaRMSNorm()
            (post_attention_layernorm): LlamaRMSNorm()
          )
        )
        (norm): LlamaRMSNorm()
      )
      (lm_head): Linear(in_features=4096, out_features=32000, bias=False)
    )
  )
)

Thank you for your answer!!

JINO-ROHIT commented 2 weeks ago

Hi @TracyGuo2001 sorry my bad i forgot to mention that you first need to change into a peft model, so try this

peft_config = LoraConfig(
task_type=TaskType.CAUSAL_LM,
target_modules=script_args.target_modules.split(','),
inference_mode=False,
r=script_args.lora_rank,
lora_alpha=script_args.lora_alpha,
lora_dropout=script_args.lora_dropout,
init_lora_weights=script_args.init_lora_weights,
)
model = AutoModelForCausalLM.from_pretrained(model_id)
model = get_peft_model(model, peft_config)
model.add_adapter(peft_config = peft_config, adapter_name="xxx")
[train].........

lmk if this works

TracyGuo2001 commented 2 weeks ago

Hi @TracyGuo2001 sorry my bad i forgot to mention that you first need to change into a peft model, so try this

peft_config = LoraConfig(
task_type=TaskType.CAUSAL_LM,
target_modules=script_args.target_modules.split(','),
inference_mode=False,
r=script_args.lora_rank,
lora_alpha=script_args.lora_alpha,
lora_dropout=script_args.lora_dropout,
init_lora_weights=script_args.init_lora_weights,
)
model = AutoModelForCausalLM.from_pretrained(model_id)
model = get_peft_model(model, peft_config)
model.add_adapter(peft_config = peft_config, adapter_name="xxx")
[train].........

lmk if this works I tried as below.

model = transformers.AutoModelForCausalLM.from_pretrained(
script_args.model_name_or_path,
………………
)
peft_config = LoraConfig(
task_type=TaskType.CAUSAL_LM,
target_modules=script_args.target_modules.split(','),
inference_mode=False,
r=script_args.lora_rank,
lora_alpha=script_args.lora_alpha,
lora_dropout=script_args.lora_dropout,
init_lora_weights=script_args.init_lora_weights,
)
model = get_peft_model(model, peft_config)
model.add_adapter(peft_config = peft_config, adapter_name="default")
print(model)

The code ran without any issues, but the generated file still does not contain the adapter_config.json and adapter_model.bin. these two only exists in checkpoints folders. image

JINO-ROHIT commented 2 weeks ago

if you want to save them you can use this

model.save_pretrained(save_dir)

and then load back using

model = AutoModelForCausalLM.from_pretrained(save_dir)
TracyGuo2001 commented 2 weeks ago

if you want to save them you can use this

model.save_pretrained(save_dir)

and then load back using

model = AutoModelForCausalLM.from_pretrained(save_dir)

I apologize if I wasn't clear enough. I tried the suggested method, but after fine-tuning, those two files are still missing. As shown in the previous image, the output/TEST directory does not contain those files. Below is my save code, and theifcondition evaluates to True, so it does enter that block

trainer.train(resume_from_checkpoint=resume_from_checkpoint_dir)
trainer.save_state()
if script_args.use_lora and script_args.merge:
    model = model.merge_and_unload()
    model.save_pretrained(script_args.output_dir)
    tokenizer.save_pretrained(script_args.output_dir)
JINO-ROHIT commented 2 weeks ago

i think this is expected right? you are training the adapter for N steps, so you will have them in the checkpoints folders.

What are you trying to do next?

BenjaminBossan commented 2 weeks ago

model = get_peft_model(model, peft_config) model.add_adapter(peft_config = peft_config, adapter_name="default")

Note that the second line is unnecessary, get_peft_model already creates a default adapter for you, add_adapter is only needed if you want more than 1 adapter.

Below is my save code, and theifcondition evaluates to True, so it does enter that block

trainer.train(resume_from_checkpoint=resume_from_checkpoint_dir)
trainer.save_state()
if script_args.use_lora and script_args.merge:
    model = model.merge_and_unload()
    model.save_pretrained(script_args.output_dir)
    tokenizer.save_pretrained(script_args.output_dir)

The issue is the model = model.merge_and_unload() line. When you call this, you're asking PEFT to merge the adapter weights into the base weights and then return the merged model (not the PEFT model!). Therefore, this new model variable is a normal transformers model and when you call save_pretrained on that, it will save the full weights. If you want to save just the adapter weights and the adapter config, you need to call save_pretrained before calling merge_and_unload.

In fact, you most likely don't need to call merge_and_unload at all in this context. The main use case for this is during inference, when you want to avoid the overhead of the adapter when it comes to inference time.

TracyGuo2001 commented 2 weeks ago

i think this is expected right? you are training the adapter for N steps, so you will have them in the checkpoints folders.

What are you trying to do next?

model = get_peft_model(model, peft_config) model.add_adapter(peft_config = peft_config, adapter_name="default")

Note that the second line is unnecessary, get_peft_model already creates a default adapter for you, add_adapter is only needed if you want more than 1 adapter.

Below is my save code, and theifcondition evaluates to True, so it does enter that block

trainer.train(resume_from_checkpoint=resume_from_checkpoint_dir)
trainer.save_state()
if script_args.use_lora and script_args.merge:
    model = model.merge_and_unload()
    model.save_pretrained(script_args.output_dir)
    tokenizer.save_pretrained(script_args.output_dir)

The issue is the model = model.merge_and_unload() line. When you call this, you're asking PEFT to merge the adapter weights into the base weights and then return the merged model (not the PEFT model!). Therefore, this new model variable is a normal transformers model and when you call save_pretrained on that, it will save the full weights. If you want to save just the adapter weights and the adapter config, you need to call save_pretrained before calling merge_and_unload.

In fact, you most likely don't need to call merge_and_unload at all in this context. The main use case for this is during inference, when you want to avoid the overhead of the adapter when it comes to inference time.

I figured it out and fixed it! Thanks! Thank you very much!

BenjaminBossan commented 2 weeks ago

Great that you could solve the issue. I assume it was indeed caused by the merge_and_unload call? I'll close the issue then, feel free to re-open if you have more questions.

TracyGuo2001 commented 2 weeks ago

Great that you could solve the issue. I assume it was indeed caused by the merge_and_unload call? I'll close the issue then, feel free to re-open if you have more questions.

YES thanks a lot!