Closed pUmpKin-Co closed 1 year ago
Have you already integrated Adan in DS? Integrating Adan with DS and Megatron-DS or Megatron-LM may follow different steps. Which one is you prefer?
Actually, I'm solely concerned with the use of DS. I don't really think I'm really doing a whole lot of integration, but just using the Adan that comes with timm and wrapping it with DS (you know, deepspeed.initialize()).
Actually, for myself, I introduced Adan at a deeper level of DS. The steps are:
from deepspeed.runtime.config import ADAN_OPTIMIZER
in 'DeepSpeed/deepspeed/runtime/engine.py'elif self.optimizer_name() == ADAN_OPTIMIZER:
from adan import Adan
optimizer = Adan(model_parameters, **optimizer_parameters)
You can add this code just below
elif self.optimizer_name() == LAMB_OPTIMIZER:
from deepspeed.ops.lamb import FusedLamb
optimizer = FusedLamb(model_parameters, **optimizer_parameters)
For now, you can use Adan in DS config. I hope these steps may help to solve the low-speed problem.
Thanks! It works out and solves all of my concerns.
Hi~Thanks for your excellent work. Adan optimzier has rechived great success in my different experiments. However, I really want any suggestions for integrating Adan with deepspeed. I tried using the ds_config with adamw and simply replacing adamw with adan (of course, I adjusted the learning rate and weight decay correspondingly), but it's pretty slow. Thank you in advance.