jiaweizzhao / GaLore

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Apache License 2.0
1.43k stars 148 forks source link

layerwise optimizer raises TypeError about slice indices #24

Closed winglian closed 8 months ago

winglian commented 8 months ago
  File "/workspace/transformers/src/transformers/trainer.py", line 1297, in optimizer_hook             
    optimizer_dict[param].step()                                                                                                              File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/optim/lr_scheduler.py", line 68, in wrapper
    return wrapped(*args, **kwargs) 
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/optim/optimizer.py", line 373, in wrapper
    out = func(*args, **kwargs)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)                                      
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/galore_torch/adamw.py", line 96, in step      
    grad = state["projector"].project(grad, state["step"])                                                                                  
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/galore_torch/galore_projector.py", line 21, in project
    self.ortho_matrix = self.get_orthogonal_matrix(full_rank_grad, self.rank, type='left')                          
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/galore_torch/galore_projector.py", line 94, in get_orthogonal_matrix
    A = U[:, :rank]                                                                                                                         
TypeError: slice indices must be integers or None or have an __index__ method
jiaweizzhao commented 8 months ago

@winglian I will look into this. It seems U has an incorrect shape or type. Could you provide more details?

winglian commented 8 months ago

Looks the optimizer args values for rank was cast as a string. No issue on your end. Thanks!