jiaweizzhao / GaLore

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Apache License 2.0
1.38k stars 144 forks source link

IndexError: tuple index out of range #47

Open zyushun opened 4 months ago

zyushun commented 4 months ago

Hi Jiawei,

I was trying Galore on TinyLlama-1B using the codebase https://github.com/jzhang38/TinyLlama on 4* A800-80GB. I encounter the following error:

[rank1]:     optimizer.step()
[rank1]:   File "/mntcephfs/lab_data/zhangyushun/anaconda/tinyllama/lib/python3.10/site-packages/lightning/fabric/wrappers.py", line 74, in step
[rank1]:     output = self._strategy.optimizer_step(
[rank1]:   File "/mntcephfs/lab_data/zhangyushun/anaconda/tinyllama/lib/python3.10/site-packages/lightning/fabric/strategies/strategy.py", line 207, in optimizer_step
[rank1]:     return self.precision.optimizer_step(optimizer, **kwargs)
[rank1]:   File "/mntcephfs/lab_data/zhangyushun/anaconda/tinyllama/lib/python3.10/site-packages/lightning/fabric/plugins/precision/fsdp.py", line 142, in optimizer_step
[rank1]:     return super().optimizer_step(optimizer, **kwargs)
[rank1]:   File "/mntcephfs/lab_data/zhangyushun/anaconda/tinyllama/lib/python3.10/site-packages/lightning/fabric/plugins/precision/precision.py", line 124, in optimizer_step
[rank1]:     return optimizer.step(**kwargs)
[rank1]:   File "/mntcephfs/lab_data/zhangyushun/anaconda/tinyllama/lib/python3.10/site-packages/torch/optim/optimizer.py", line 391, in wrapper
[rank1]:     out = func(*args, **kwargs)
[rank1]:   File "/mntcephfs/lab_data/zhangyushun/anaconda/tinyllama/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank1]:     return func(*args, **kwargs)
[rank1]:   File "/mntcephfs/lab_data/zhangyushun/anaconda/tinyllama/lib/python3.10/site-packages/galore_torch/adamw.py", line 96, in step
[rank1]:     grad = state["projector"].project(grad, state["step"])
[rank1]:   File "/mntcephfs/lab_data/zhangyushun/anaconda/tinyllama/lib/python3.10/site-packages/galore_torch/galore_projector.py", line 15, in project
[rank1]:     if full_rank_grad.shape[0] >= full_rank_grad.shape[1]:
[rank1]: IndexError: tuple index out of range

I use galore as you suggested in torchrun_main.py:

print('using galore')
galore_params = []
target_modules_list = [ "attn", "mlp"]
for module_name, module in model.named_modules():
    if not isinstance(module, nn.Linear):
        continue

    if not any(target_key in module_name for target_key in target_modules_list):
        continue

    print('enable GaLore for weights in module: ', module_name)
    galore_params.append(module.weight)

id_galore_params = [id(p) for p in galore_params]

# make parameters without "rank" to another group
regular_params = [p for p in model.parameters() if id(p) not in id_galore_params]
# then call galore_adamw
param_groups = [{'params': regular_params}, 
                {'params': galore_params, 'rank': 128, 'update_proj_gap': 200, 'scale': 0.25, 'proj_type': 'std'}]

optimizer = GaLoreAdamW(param_groups, lr=learning_rate)

Any idea why and how to fix it?

Thanks in advance!

nicosouth commented 4 months ago

do you solve it? i have the same error

zyushun commented 4 months ago

do you solve it? i have the same error

not yet

Jackie0601zhou commented 4 months ago

I have some problem as well.

FabioDataGeek commented 4 months ago

Same problem here, the structure of grouped parameters is:

List[Dict['params'], Dict['params', 'rank', 'update_proj_gap', 'scale', 'proj_type']]

where 'params' is a list of tensors. I'm trying with pretrained models from huggingface.

FabioDataGeek commented 4 months ago

I found the error. For the projection in a lower rank, you need tensors of dimension 2 (matrix). If parameters of dimension 1 as LayerNorm are added to the galore_params, it would raise the error trying to access the second dimension.

Make sure that ALL the parameters sent to galore_params have a second dimension.

nicosouth commented 4 months ago

I found the error. For the projection in a lower rank, you need tensors of dimension 2 (matrix). If parameters of dimension 1 as LayerNorm are added to the galore_params, it would raise the error trying to access the second dimension.

Make sure that ALL the parameters sent to galore_params have a second dimension.

do u use the deepspeed and train on multi-node? i just select the attn and mlp modules for galore.

FabioDataGeek commented 4 months ago

I'm just testing Galore on Local with only Pytorch; this is the function I've been using for grouping the parameters:

`

def galore_parameters(model):
    galore_params = []
    non_galore_params = []
    for name, param in model.named_parameters():

    if 'embeddings' in name and not 'LayerNorm' in name:
        galore_params.append(param)
        continue

    if 'layer' in name and 'weight' in name and not 'LayerNorm' in name:
        galore_params.append(param)
        continue

    if 'classifier' in name and not 'bias' in name:
        galore_params.append(param)
        continue

    else:
        non_galore_params.append(param)

param_groups = [{'params': non_galore_params},
                {'params': galore_params, 'rank': 128, 'update_proj_gap': 200, 'scale': 0.25, 'proj_type': 'std'}]   # 'proj_type': 'std', 'reverse_std','right', 'left', 'full'

for param in galore_params:
    if param.dim() != 2:
        raise ValueError('Galore only supports 2D parameters')

return param_groups

`

Consider the 'model' variable a pre-trained model loaded from Huggingface. I've checked the different layer names with the VSC debugger and set the 'if' statements accordingly, so you should change them for your specific model. For instance, 'classifier' applies only to classification heads on top of Language Models.

The test has been done with RoBERTa.

dinhanhx commented 4 months ago

From my wacky understanding, GaLore only works with nn.Linear().weight.

FabioDataGeek commented 4 months ago

Actually, it worked for me with the embeddings layers, which are from nn.Embedding(). I only found the above-mentioned problem when the current layer has only one dimension in its tensors i.e size = [768] instead of [768, 768]

FabioDataGeek commented 4 months ago

Specifically, the error comes from line 15 in galore_projector.py :

if full_rank_grad.shape[0] >= full_rank_grad.shape[1]:

For the standard projection type it will compare the first and the second dimension of each tensor given in galore_params. You would have the same error in every projection type that compares both dimensions.

Shinechaote commented 1 week ago

I am facing the same issue. I am training using FSDP and use_orig_params=True, FSDP however still flattens the parameters. Has anyone faced this issue before and was able to fix it?