Open benjamin-reichman opened 2 years ago
@benjamin-reichman, thanks for reporting this error. That Megatron-LM code is quite old. Is it possible for you to use this version? This one is more actively used. Thanks!
@benjamin-reichman, did you solve this problem yet? I met the same problem and I wonder how to solve it.
@benjamin-reichman, did you solve this problem yet? I met the same problem and I wonder how to solve it.
I met the same problem.
I keep having this trouble with the optimizer and I am not sure what is causing it. The error is below:
My training loop is nothing fancy, pretty standard:
I initialize deepspeed like this:
self.model, self.optimizer, _, self.lr_scheduler = ds.initialize(model=self.model, config_params=self.deepspeed_config, optimizer=self.optimizer, lr_scheduler=self.lr_scheduler)
I am using the example configurations from here: https://github.com/microsoft/DeepSpeedExamples/blob/master/Megatron-LM/scripts/ds_zero2_config.json
Does anyone know why I am getting this issue?
Thank you!