Closed imrankh46 closed 3 months ago
If you just want to make GaLore work with linear layers, you can iterate over all the parameters and put the linear ones into the target modules.
If you just want to make GaLore work with linear layers, you can iterate over all the parameters and put the linear ones into the target modules.
Thanks for your reply.🙂 I am not using peft and Lora.
Without Lora and peft I want to do . Do you mean here.. optim_target_modules=["linear"]
Am I right?.
Actually, it will be something like this: https://github.com/jiaweizzhao/GaLore/blob/master/torchrun_main.py#L265-L275
The argument optim_target_modules
comes from the Hugging Face implementation, which only requires the inclusion of module names. So, you can use the following trick if you don't know the names of the linear modules:
optim_target_modules = []
for module_name, module in model.named_modules():
if isinstance(module, nn.Linear):
optim_target_modules.append(module_name)
Note that lm_head
will also be included, which is different from using only ['attn', 'mlp']
. You can apply further filters on your choice.
Actually, it will be something like this: https://github.com/jiaweizzhao/GaLore/blob/master/torchrun_main.py#L265-L275
The argument
optim_target_modules
comes from the Hugging Face implementation, which only requires the inclusion of module names. So, you can use the following trick if you don't know the names of the linear modules:optim_target_modules = [] for module_name, module in model.named_modules(): if isinstance(module, nn.Linear): optim_target_modules.append(module_name)
Note that
lm_head
will also be included, which is different from using only['attn', 'mlp']
. You can apply further filters on your choice.
Thanks
Great work, and many many thanks for this.
I already fine tune a model. And it's showing best performance. My question is If I want to fine tune llama2, Mistral, OpenChat etc. So how I can get the following.
optim_target_modules=["attn", "mlp"]
Because it's suggest, make sure to confirm these optim_target_modules model. MLP is match and the other one are not..
Any docs available or any suggestions.
Thanks.