jiaweizzhao / GaLore

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Apache License 2.0
1.24k stars 131 forks source link

How to get optim_target_modules=["attn", "mlp"] for other model? #27

Closed imrankh46 closed 3 months ago

imrankh46 commented 3 months ago

Great work, and many many thanks for this.

I already fine tune a model. And it's showing best performance. My question is If I want to fine tune llama2, Mistral, OpenChat etc. So how I can get the following.

optim_target_modules=["attn", "mlp"]

Because it's suggest, make sure to confirm these optim_target_modules model. MLP is match and the other one are not..

Any docs available or any suggestions.

Thanks.

PenutChen commented 3 months ago

If you just want to make GaLore work with linear layers, you can iterate over all the parameters and put the linear ones into the target modules.

imrankh46 commented 3 months ago

If you just want to make GaLore work with linear layers, you can iterate over all the parameters and put the linear ones into the target modules.

Thanks for your reply.🙂 I am not using peft and Lora.

Without Lora and peft I want to do . Do you mean here.. optim_target_modules=["linear"]

Am I right?.

PenutChen commented 3 months ago

Actually, it will be something like this: https://github.com/jiaweizzhao/GaLore/blob/master/torchrun_main.py#L265-L275

The argument optim_target_modules comes from the Hugging Face implementation, which only requires the inclusion of module names. So, you can use the following trick if you don't know the names of the linear modules:

optim_target_modules = []
for module_name, module in model.named_modules():
    if isinstance(module, nn.Linear):
        optim_target_modules.append(module_name)

Note that lm_head will also be included, which is different from using only ['attn', 'mlp']. You can apply further filters on your choice.

imrankh46 commented 3 months ago

Actually, it will be something like this: https://github.com/jiaweizzhao/GaLore/blob/master/torchrun_main.py#L265-L275

The argument optim_target_modules comes from the Hugging Face implementation, which only requires the inclusion of module names. So, you can use the following trick if you don't know the names of the linear modules:

optim_target_modules = []
for module_name, module in model.named_modules():
    if isinstance(module, nn.Linear):
        optim_target_modules.append(module_name)

Note that lm_head will also be included, which is different from using only ['attn', 'mlp']. You can apply further filters on your choice.

Thanks