[MiniLLM] About the gradient accumulation in finetune.py

microsoft / LMOps

General technology for enabling AI capabilities w/ LLMs and MLLMs

https://aka.ms/GeneralAI

MIT License

3.6k stars 274 forks source link

[MiniLLM] About the gradient accumulation in finetune.py #170

Closed songmzhang closed 7 months ago

songmzhang commented 7 months ago

In finetune.py, lines283~284, model.backward(loss) is directly followed by model.step(). Does this mean that the model parameters will immediately update once the loss in the current step is back-propogated and the args.gradient_accumulation_steps is not working? Is this a bug?

t1101675 commented 7 months ago

The gradient accumulation is automatically done by DeepSpeed.

songmzhang commented 7 months ago

The gradient accumulation is automatically done by DeepSpeed.

OK, thanks!