Open notfacezhi opened 1 year ago
@notfacezhi You can add find_unused_parameters=True
in the configuration.
Thank you! I tried it and found it works fine. Wish you a happy life!
@hhaAndroid
@notfacezhi You can add
find_unused_parameters=True
in the configuration.
Hi @hhaAndroid , I met the same error . I added my own modification part to the official code and it can run in single card mode, but it cannot run in multi-card case. And I added find_unused_parameters=True
in the configuration.
However, a new error showed:
RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parameter outside the forward
function. Please make sure model parameters are not shared across multiple concurrent forward-backward passes. or try to use _set_static_graph() as a workaround if this module graph does not change during training loop.2) Reused parameters in multiple reentrant backward passes. For example, if you use multiple checkpoint
functions to wrap the same part of your model, it would result in the same set of parameters been used by different reentrant backward passes multiple times, and hence marking a variable ready multiple times. DDP does not support such use cases in default. You can try to use _set_static_graph() as a workaround if your module graph does not change over iterations.
Why is this and how can I solve this? Thank you!
@notfacezhi You can add
find_unused_parameters=True
in the configuration.Hi @hhaAndroid , I met the same error . I added my own modification part to the official code and it can run in single card mode, but it cannot run in multi-card case. And I added
find_unused_parameters=True
in the configuration. However, a new error showed: RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parameter outside theforward
function. Please make sure model parameters are not shared across multiple concurrent forward-backward passes. or try to use _set_static_graph() as a workaround if this module graph does not change during training loop.2) Reused parameters in multiple reentrant backward passes. For example, if you use multiplecheckpoint
functions to wrap the same part of your model, it would result in the same set of parameters been used by different reentrant backward passes multiple times, and hence marking a variable ready multiple times. DDP does not support such use cases in default. You can try to use _set_static_graph() as a workaround if your module graph does not change over iterations.Why is this and how can I solve this? Thank you!
Have you solve the problem? I met the same problem now.
Adding my own modification part to the official code found that it can run in single card mode, but it cannot run in multi-card case, what should I do?