zyushun / Adam-mini

Code for Adam-mini: Use Fewer Learning Rates To Gain More https://arxiv.org/abs/2406.16793
257 stars 9 forks source link

Adam mini isn't compatible with HuggingFace #2

Closed hahuyhoang411 closed 1 month ago

hahuyhoang411 commented 1 month ago

I create a custom optimizer to test Adam-mini with HuggingFace Trainer, but no luck. Any helps?

   File "voice_data_process/adam_mini_train.py", line 204, in <module>
     trainer_stats = trainer.train() # for resumeing the training from checkpoint # resume_from_checkpoint=True
                     ^^^^^^^^^^^^^^^
   File "local/lib/python3.11/site-packages/trl/trainer/sft_trainer.py", line 440, in train
     output = super().train(*args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   File "local/lib/python3.11/site-packages/transformers/trainer.py", line 1885, in train
     return inner_training_loop(
            ^^^^^^^^^^^^^^^^^^^^
   File "local/lib/python3.11/site-packages/transformers/trainer.py", line 2279, in _inner_training_loop
     self.optimizer.step()
   File "local/lib/python3.11/site-packages/accelerate/optimizer.py", line 170, in step
     self.optimizer.step(closure)
   File "local/lib/python3.11/site-packages/torch/optim/lr_scheduler.py", line 75, in wrapper
     return wrapped(*args, **kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^
   File "local/lib/python3.11/site-packages/torch/optim/optimizer.py", line 391, in wrapper
     out = func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
   File "voice_data_process/Adam_mini.py", line 231, in step
     tmp_lr = tmp_lr.to(grad.device)
                        ^^^^
 UnboundLocalError: cannot access local variable 'grad' where it is not associated with a value
zyushun commented 1 month ago

Hi,

Thanks for mentioning this! We have fixed this bug and have updated the code in our repository. Please try again using the latest Adam-mini.py

hahuyhoang411 commented 1 month ago

Awesome it works like a champ