Open zhc7 opened 2 months ago
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Hitting this snag right now actually, will see what we decide to do
We're working with the DS team to try and remove the engine
entirely, however as a user you can always call model.engine.backward()
etc manually without harm in accelerate
The source code of DeepSpeedEngineWrapper:
My question is: Why do we need to do
self.engine.step()
here immediately? This behavior zeros grad and change the parameter without noticing the user. It might be out of expectation. Since backward step is internally binded with zeroing grad and changing parameter, this blocks users from checking the gradient or parameter manually before stepping.I know deepspeed-wrapped models can't be seen as normal models, but this behavior still elimiates a lot of flexibility.