Open zyushun opened 4 months ago
do you solve it? i have the same error
do you solve it? i have the same error
not yet
I have some problem as well.
Same problem here, the structure of grouped parameters is:
List[Dict['params'], Dict['params', 'rank', 'update_proj_gap', 'scale', 'proj_type']]
where 'params' is a list of tensors. I'm trying with pretrained models from huggingface.
I found the error. For the projection in a lower rank, you need tensors of dimension 2 (matrix). If parameters of dimension 1 as LayerNorm are added to the galore_params, it would raise the error trying to access the second dimension.
Make sure that ALL the parameters sent to galore_params have a second dimension.
I found the error. For the projection in a lower rank, you need tensors of dimension 2 (matrix). If parameters of dimension 1 as LayerNorm are added to the galore_params, it would raise the error trying to access the second dimension.
Make sure that ALL the parameters sent to galore_params have a second dimension.
do u use the deepspeed and train on multi-node? i just select the attn and mlp modules for galore.
I'm just testing Galore on Local with only Pytorch; this is the function I've been using for grouping the parameters:
`
def galore_parameters(model):
galore_params = []
non_galore_params = []
for name, param in model.named_parameters():
if 'embeddings' in name and not 'LayerNorm' in name:
galore_params.append(param)
continue
if 'layer' in name and 'weight' in name and not 'LayerNorm' in name:
galore_params.append(param)
continue
if 'classifier' in name and not 'bias' in name:
galore_params.append(param)
continue
else:
non_galore_params.append(param)
param_groups = [{'params': non_galore_params},
{'params': galore_params, 'rank': 128, 'update_proj_gap': 200, 'scale': 0.25, 'proj_type': 'std'}] # 'proj_type': 'std', 'reverse_std','right', 'left', 'full'
for param in galore_params:
if param.dim() != 2:
raise ValueError('Galore only supports 2D parameters')
return param_groups
`
Consider the 'model' variable a pre-trained model loaded from Huggingface. I've checked the different layer names with the VSC debugger and set the 'if' statements accordingly, so you should change them for your specific model. For instance, 'classifier' applies only to classification heads on top of Language Models.
The test has been done with RoBERTa.
From my wacky understanding, GaLore only works with nn.Linear().weight
.
Actually, it worked for me with the embeddings layers, which are from nn.Embedding()
.
I only found the above-mentioned problem when the current layer has only one dimension in its tensors i.e size = [768] instead of [768, 768]
Specifically, the error comes from line 15 in galore_projector.py
:
if full_rank_grad.shape[0] >= full_rank_grad.shape[1]:
For the standard projection type it will compare the first and the second dimension of each tensor given in galore_params
. You would have the same error in every projection type that compares both dimensions.
I am facing the same issue. I am training using FSDP and use_orig_params=True, FSDP however still flattens the parameters. Has anyone faced this issue before and was able to fix it?
Hi Jiawei,
I was trying Galore on TinyLlama-1B using the codebase https://github.com/jzhang38/TinyLlama on 4* A800-80GB. I encounter the following error:
I use galore as you suggested in torchrun_main.py:
Any idea why and how to fix it?
Thanks in advance!