huggingface / peft

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
https://huggingface.co/docs/peft
Apache License 2.0
15.72k stars 1.52k forks source link

load a model with unmerged lora adapters #1934

Closed Ledzy closed 1 week ago

Ledzy commented 1 month ago

Feature request

For some reason, I need to train a model's both LoRA modules and the original weights. The model is saved periodically during training. However, if i directly save the model with unmerged lora adapters, i cannot load it since the from_pretrained method seems to only support: 1) load a pretrained model and then load adapters, or 2) load a checkpoint that has the same parameters as pretrained model (and thereby cannot have any LoRA module). Neither cases match my need.

Therefore, I am wondering if peft/transformers can support loading a model with unmerged lora adapters?

Motivation

This option seems very natural and is useful for research.

Your contribution

I am actively working on this feature, yet haven't figure out a clean approach. I would create a pr once I finish it. Any assistance or suggestion is appreciated.

BenjaminBossan commented 1 month ago

Did you use the modules_to_save feature to fully fine-tune those original weights? E.g.: config =LoraConfig(..., modules_to_save=["layer1", "layer2"])` makes it so that "layer1" and "layer2" are fully fine-tuned. PEFT creates a copy of the original weights for that, so that the base model remains untouched. When you save the adapter, those copies are included in the checkpoint.

Ledzy commented 1 month ago

Thanks for reply. I'm directly tuning the original model for reducing memory consumption. I managed to solve the issue by move the model to cpu, copy the model, and apply merge_and_unload on the copied model:

def custom_save_pretrained(self: "PeftModelForCausalLM", *args, **kwargs):
    lora_model: "LoraModel" = self.base_model
    gpu_device = lora_model.device

    lora_model.cpu()
    model_to_save = copy.deepcopy(lora_model)
    model_to_save.merge_and_unload()
    model_to_save.save_pretrained(*args, **kwargs)

    # move model back to gpu
    lora_model.to(gpu_device)

model.save_pretrained = MethodType(custom_save_pretrained, model)

However, this only works for the single GPU case. When using model parallel, e.g. Deepspeed ZeRO-3, copy.deepcopy(lora_model) will throw an error:

 File "/home/ledzy/code/BAdam/src/badam/utils.py", line 63, in save_pretrained
    model_to_save.save_pretrained(*args, **kwargs)
                            ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ledzy/anaconda3/envs/llama_factory/lib/python3.11/copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ledzy/anaconda3/envs/llama_factory/lib/python3.11/copy.py", line 271, in _reconstruct
    state = deepcopy(state, memo)
            ^^^^^^^^^^^^^^^^^^^^^
  File "/home/ledzy/anaconda3/envs/llama_factory/lib/python3.11/copy.py", line 146, in deepcopy
    y = copier(x, memo)
        ^^^^^^^^^^^^^^^
  File "/home/ledzy/anaconda3/envs/llama_factory/lib/python3.11/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
                             ^^^^^^^^^^^^^^^^^^^^^
  File "/home/ledzy/anaconda3/envs/llama_factory/lib/python3.11/copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ledzy/anaconda3/envs/llama_factory/lib/python3.11/copy.py", line 265, in _reconstruct
    y = func(*args)
        ^^^^^^^^^^^
TypeError: ZeROOrderedDict.__init__() missing 1 required positional argument: 'parent_module'

which seems non-trivial and is related to the reference of the variables created by deepspeed.

BenjaminBossan commented 1 month ago

I'm directly tuning the original model for reducing memory consumption.

I see, yes, using modules_to_save will incur a bit of extra memory. Maybe you could try manually moving the original parameters to CPU if you don't need them, in which case the memory should be back to what it was initially.

When using model parallel, e.g. Deepspeed ZeRO-3, copy.deepcopy(lora_model) will throw an error:

Yeah, the weights are sharded, so you'd have to collect them all before being able to deepcopy them. I'm actually not sure if this works with CPU and DeepSpeed or not. You could try using the deepspeed.zero.GatheredParameters context and see if that resolves the error.

github-actions[bot] commented 3 weeks ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.