huggingface / peft

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
https://huggingface.co/docs/peft
Apache License 2.0
15.85k stars 1.53k forks source link

LoRa MergedLinear Layer merging doesn't work #492

Closed JamesDConley closed 1 year ago

JamesDConley commented 1 year ago

(This is from Version 0.1) The current version doesn't appear to have an implementation for MergedLinear but the version I trained my models with has a bug that is preventing me from merging the weights.

Is there any way I can update my model to use the new version without retraining? So that I can calculate the delta_w and combine the weights so I can export as a GPTNeoX model rather than a PeftModel. (I know this will need some custom code but once I get this math worked out should just be load model, assign merged layer weights, save state dict)

image

peft lora implementation ~line 263 here https://github.com/huggingface/peft/blob/29357d41eb5cb6f4161279a5c02379a6a040e882/src/peft/tuners/lora.py

loralib implementation is ~line 229 here https://github.com/microsoft/LoRA/blob/375704a4f7e376f8f1c4b43e2fd524859dac6a59/loralib/layers.py#L156

Interestingly I've tried both and unless I'm missing something it seems like neither works for reconstructing the weight delta. Is there a bug in this version of the LoRa implementation?

JamesDConley commented 1 year ago

After double checking my environment, it seems like I may have trained with peft version 0.2.0 but the problem persists. I'll try again sometime with a smaller NeoX model so I can train a LoRa quickly and see if I can replicate this issue with know package versions.

image

zhangzuizui commented 1 year ago

there are bugs when run model.eval() with peft 0.2.0, use peft 0.1.0 can solve this problem

JamesDConley commented 1 year ago

Was able to find a fix. Both 0.2.0 and 0.1.0 appear to have an issue with this, haven't tested anything in 0.3.0. The code to merged all the MergedLinear layers is

def transpose(weight, fan_in_fan_out):
    return weight.T if fan_in_fan_out else weight

def merge_weights(layer):
    assert layer.merged == False, f"Module was already merged!"
    # Merge the weights and mark it
    if layer.r > 0 and any(layer.enable_lora):
        delta_w = F.conv1d(
             layer.lora_A.weight.data.unsqueeze(0),
             layer.lora_B.weight.data,
             groups=sum(layer.enable_lora),
         ).squeeze(0)
    layer.weight.data += layer.zero_pad(transpose(delta_w * layer.scaling, True)).T
    layer.merged = True

def apply_peft_merge(module):
    if isinstance(module, peft.tuners.lora.MergedLinear):
        merge_weights(module)
    for child in module.children():
        apply_peft_merge(child)
JamesDConley commented 1 year ago

Big thanks to ShengleiH for providing the changes to the broken code that made the above script work. https://github.com/microsoft/LoRA/issues/75#issuecomment-1558965652

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.