sail-sg / Adan

Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models
Apache License 2.0
756 stars 64 forks source link

Deepspeed Integration #36

Closed pUmpKin-Co closed 1 year ago

pUmpKin-Co commented 1 year ago

Hi~Thanks for your excellent work. Adan optimzier has rechived great success in my different experiments. However, I really want any suggestions for integrating Adan with deepspeed. I tried using the ds_config with adamw and simply replacing adamw with adan (of course, I adjusted the learning rate and weight decay correspondingly), but it's pretty slow. Thank you in advance.

XingyuXie commented 1 year ago

Have you already integrated Adan in DS? Integrating Adan with DS and Megatron-DS or Megatron-LM may follow different steps. Which one is you prefer?

pUmpKin-Co commented 1 year ago

Actually, I'm solely concerned with the use of DS. I don't really think I'm really doing a whole lot of integration, but just using the Adan that comes with timm and wrapping it with DS (you know, deepspeed.initialize()).

XingyuXie commented 1 year ago

Actually, for myself, I introduced Adan at a deeper level of DS. The steps are:

For now, you can use Adan in DS config. I hope these steps may help to solve the low-speed problem.

pUmpKin-Co commented 1 year ago

Thanks! It works out and solves all of my concerns.