Closed TracyGuo2001 closed 2 weeks ago
@TracyGuo2001 what is the peft method you are using? can you share a small snippet?
I use LoRA. Here is the code. Thank you very much.
peft_config = LoraConfig(
task_type=TaskType.CAUSAL_LM,
target_modules=script_args.target_modules.split(','),
inference_mode=False,
r=script_args.lora_rank,
lora_alpha=script_args.lora_alpha,
lora_dropout=script_args.lora_dropout,
init_lora_weights=script_args.init_lora_weights,
)
model = get_peft_model(model, peft_config)
......(train)
model.save_pretrained(script_args.output_dir)
can you try this ? @TracyGuo2001
peft_config = LoraConfig(
task_type=TaskType.CAUSAL_LM,
target_modules=script_args.target_modules.split(','),
inference_mode=False,
r=script_args.lora_rank,
lora_alpha=script_args.lora_alpha,
lora_dropout=script_args.lora_dropout,
init_lora_weights=script_args.init_lora_weights,
)
model = AutoModelForCausalLM.from_pretrained(model_id)
model.add_adapter(peft_config)
[train].........
@TracyGuo2001 are you sure that model
is not overwritten between the call to get_peft_model
and model.save_pretrained
? It would be helpful to have the output of print(model)
and possibly the code so that we can take a look.
can you try this ? @TracyGuo2001
peft_config = LoraConfig( task_type=TaskType.CAUSAL_LM, target_modules=script_args.target_modules.split(','), inference_mode=False, r=script_args.lora_rank, lora_alpha=script_args.lora_alpha, lora_dropout=script_args.lora_dropout, init_lora_weights=script_args.init_lora_weights, ) model = AutoModelForCausalLM.from_pretrained(model_id) model.add_adapter(peft_config) [train].........
I can get the model from pretrained, but got an ERROR when model.add_adapter(peft_config)
.AttributeError: can't set attribute.How could this happen? It's so strange.
but after model = get_peft_model(model, peft_config), i can get a model with LoRA,this is the the output of print(model).
PeftModelForCausalLM(
(base_model): LoraModel(
(model): LlamaForCausalLM(
(model): LlamaModel(
(embed_tokens): Embedding(32000, 4096, padding_idx=0)
(layers): ModuleList(
(0-31): 32 x LlamaDecoderLayer(
(self_attn): LlamaAttention(
(q_proj): lora.Linear(
(base_layer): Linear(in_features=4096, out_features=4096, bias=False)
(lora_dropout): ModuleDict(
(default): Identity()
)
(lora_A): ModuleDict(
(default): Linear(in_features=4096, out_features=128, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=128, out_features=4096, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
(lora_magnitude_vector): ModuleDict()
)
(k_proj): lora.Linear(
(base_layer): Linear(in_features=4096, out_features=4096, bias=False)
(lora_dropout): ModuleDict(
(default): Identity()
)
(lora_A): ModuleDict(
(default): Linear(in_features=4096, out_features=128, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=128, out_features=4096, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
(lora_magnitude_vector): ModuleDict()
)
(v_proj): lora.Linear(
(base_layer): Linear(in_features=4096, out_features=4096, bias=False)
(lora_dropout): ModuleDict(
(default): Identity()
)
(lora_A): ModuleDict(
(default): Linear(in_features=4096, out_features=128, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=128, out_features=4096, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
(lora_magnitude_vector): ModuleDict()
)
(o_proj): lora.Linear(
(base_layer): Linear(in_features=4096, out_features=4096, bias=False)
(lora_dropout): ModuleDict(
(default): Identity()
)
(lora_A): ModuleDict(
(default): Linear(in_features=4096, out_features=128, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=128, out_features=4096, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
(lora_magnitude_vector): ModuleDict()
)
(rotary_emb): LlamaRotaryEmbedding()
)
(mlp): LlamaMLP(
(gate_proj): lora.Linear(
(base_layer): Linear(in_features=4096, out_features=11008, bias=False)
(lora_dropout): ModuleDict(
(default): Identity()
)
(lora_A): ModuleDict(
(default): Linear(in_features=4096, out_features=128, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=128, out_features=11008, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
(lora_magnitude_vector): ModuleDict()
)
(up_proj): lora.Linear(
(base_layer): Linear(in_features=4096, out_features=11008, bias=False)
(lora_dropout): ModuleDict(
(default): Identity()
)
(lora_A): ModuleDict(
(default): Linear(in_features=4096, out_features=128, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=128, out_features=11008, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
(lora_magnitude_vector): ModuleDict()
)
(down_proj): lora.Linear(
(base_layer): Linear(in_features=11008, out_features=4096, bias=False)
(lora_dropout): ModuleDict(
(default): Identity()
)
(lora_A): ModuleDict(
(default): Linear(in_features=11008, out_features=128, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=128, out_features=4096, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
(lora_magnitude_vector): ModuleDict()
)
(act_fn): SiLUActivation()
)
(input_layernorm): LlamaRMSNorm()
(post_attention_layernorm): LlamaRMSNorm()
)
)
(norm): LlamaRMSNorm()
)
(lm_head): Linear(in_features=4096, out_features=32000, bias=False)
)
)
)
Thank you for your answer!!
Hi @TracyGuo2001 sorry my bad i forgot to mention that you first need to change into a peft model, so try this
peft_config = LoraConfig(
task_type=TaskType.CAUSAL_LM,
target_modules=script_args.target_modules.split(','),
inference_mode=False,
r=script_args.lora_rank,
lora_alpha=script_args.lora_alpha,
lora_dropout=script_args.lora_dropout,
init_lora_weights=script_args.init_lora_weights,
)
model = AutoModelForCausalLM.from_pretrained(model_id)
model = get_peft_model(model, peft_config)
model.add_adapter(peft_config = peft_config, adapter_name="xxx")
[train].........
lmk if this works
Hi @TracyGuo2001 sorry my bad i forgot to mention that you first need to change into a peft model, so try this
peft_config = LoraConfig( task_type=TaskType.CAUSAL_LM, target_modules=script_args.target_modules.split(','), inference_mode=False, r=script_args.lora_rank, lora_alpha=script_args.lora_alpha, lora_dropout=script_args.lora_dropout, init_lora_weights=script_args.init_lora_weights, ) model = AutoModelForCausalLM.from_pretrained(model_id) model = get_peft_model(model, peft_config) model.add_adapter(peft_config = peft_config, adapter_name="xxx") [train].........
lmk if this works I tried as below.
model = transformers.AutoModelForCausalLM.from_pretrained( script_args.model_name_or_path, ……………… ) peft_config = LoraConfig( task_type=TaskType.CAUSAL_LM, target_modules=script_args.target_modules.split(','), inference_mode=False, r=script_args.lora_rank, lora_alpha=script_args.lora_alpha, lora_dropout=script_args.lora_dropout, init_lora_weights=script_args.init_lora_weights, ) model = get_peft_model(model, peft_config) model.add_adapter(peft_config = peft_config, adapter_name="default") print(model)
The code ran without any issues, but the generated file still does not contain the adapter_config.json and adapter_model.bin. these two only exists in checkpoints folders.
if you want to save them you can use this
model.save_pretrained(save_dir)
and then load back using
model = AutoModelForCausalLM.from_pretrained(save_dir)
if you want to save them you can use this
model.save_pretrained(save_dir)
and then load back using
model = AutoModelForCausalLM.from_pretrained(save_dir)
I apologize if I wasn't clear enough. I tried the suggested method, but after fine-tuning, those two files are still missing. As shown in the previous image, the output/TEST
directory does not contain those files. Below is my save code, and theif
condition evaluates to True
, so it does enter that block
trainer.train(resume_from_checkpoint=resume_from_checkpoint_dir)
trainer.save_state()
if script_args.use_lora and script_args.merge:
model = model.merge_and_unload()
model.save_pretrained(script_args.output_dir)
tokenizer.save_pretrained(script_args.output_dir)
i think this is expected right? you are training the adapter for N steps, so you will have them in the checkpoints folders.
What are you trying to do next?
model = get_peft_model(model, peft_config) model.add_adapter(peft_config = peft_config, adapter_name="default")
Note that the second line is unnecessary, get_peft_model
already creates a default adapter for you, add_adapter
is only needed if you want more than 1 adapter.
Below is my save code, and the
if
condition evaluates toTrue
, so it does enter that blocktrainer.train(resume_from_checkpoint=resume_from_checkpoint_dir) trainer.save_state() if script_args.use_lora and script_args.merge: model = model.merge_and_unload() model.save_pretrained(script_args.output_dir) tokenizer.save_pretrained(script_args.output_dir)
The issue is the model = model.merge_and_unload()
line. When you call this, you're asking PEFT to merge the adapter weights into the base weights and then return the merged model (not the PEFT model!). Therefore, this new model
variable is a normal transformers model and when you call save_pretrained
on that, it will save the full weights. If you want to save just the adapter weights and the adapter config, you need to call save_pretrained
before calling merge_and_unload
.
In fact, you most likely don't need to call merge_and_unload
at all in this context. The main use case for this is during inference, when you want to avoid the overhead of the adapter when it comes to inference time.
i think this is expected right? you are training the adapter for N steps, so you will have them in the checkpoints folders.
What are you trying to do next?
model = get_peft_model(model, peft_config) model.add_adapter(peft_config = peft_config, adapter_name="default")
Note that the second line is unnecessary,
get_peft_model
already creates a default adapter for you,add_adapter
is only needed if you want more than 1 adapter.Below is my save code, and the
if
condition evaluates toTrue
, so it does enter that blocktrainer.train(resume_from_checkpoint=resume_from_checkpoint_dir) trainer.save_state() if script_args.use_lora and script_args.merge: model = model.merge_and_unload() model.save_pretrained(script_args.output_dir) tokenizer.save_pretrained(script_args.output_dir)
The issue is the
model = model.merge_and_unload()
line. When you call this, you're asking PEFT to merge the adapter weights into the base weights and then return the merged model (not the PEFT model!). Therefore, this newmodel
variable is a normal transformers model and when you callsave_pretrained
on that, it will save the full weights. If you want to save just the adapter weights and the adapter config, you need to callsave_pretrained
before callingmerge_and_unload
.In fact, you most likely don't need to call
merge_and_unload
at all in this context. The main use case for this is during inference, when you want to avoid the overhead of the adapter when it comes to inference time.
I figured it out and fixed it! Thanks! Thank you very much!
Great that you could solve the issue. I assume it was indeed caused by the merge_and_unload
call? I'll close the issue then, feel free to re-open if you have more questions.
Great that you could solve the issue. I assume it was indeed caused by the
merge_and_unload
call? I'll close the issue then, feel free to re-open if you have more questions.
YES thanks a lot!
System Info
peft 0.13.3.dev0
Who can help?
No response
Information
Tasks
examples
folderReproduction
after using model.save_pretrained(script_args.output_dir), i didnt get adapter_model.bin and adapter_config.json
then i check this /peft/mixed_model.py,save_pretrained() is as below without implementation.
Expected behavior
i want the final adapter_model.bin and adapter_config.json after ft.