Closed che-sh closed 3 weeks ago
This pull request was exported from Phabricator. Differential Revision: D64572844
This pull request was exported from Phabricator. Differential Revision: D64572844
This pull request was exported from Phabricator. Differential Revision: D64572844
This pull request was exported from Phabricator. Differential Revision: D64572844
This pull request was exported from Phabricator. Differential Revision: D64572844
This pull request was exported from Phabricator. Differential Revision: D64572844
This pull request was exported from Phabricator. Differential Revision: D64572844
This pull request was exported from Phabricator. Differential Revision: D64572844
This pull request was exported from Phabricator. Differential Revision: D64572844
This pull request was exported from Phabricator. Differential Revision: D64572844
This pull request was exported from Phabricator. Differential Revision: D64572844
This pull request was exported from Phabricator. Differential Revision: D64572844
This pull request was exported from Phabricator. Differential Revision: D64572844
This pull request was exported from Phabricator. Differential Revision: D64572844
This pull request was exported from Phabricator. Differential Revision: D64572844
This pull request was exported from Phabricator. Differential Revision: D64572844
This pull request was exported from Phabricator. Differential Revision: D64572844
This pull request was exported from Phabricator. Differential Revision: D64572844
This pull request was exported from Phabricator. Differential Revision: D64572844
This pull request was exported from Phabricator. Differential Revision: D64572844
This pull request was exported from Phabricator. Differential Revision: D64572844
Summary: To avoid issues with checkpointing and restoring,
PipelinedPreproc
should behave "transparently" to operations related to the model structure - specifically the ones that save or loadstate_dict
. To do so, we want to preserve the original FQN of the preproc module inside thePipelinedPreproc
class that wraps it. The relevant methods are:1) named_modules 2) named_parameters 3) named_buffers 4) state_dict 5) load_state_dict
Potential limitation: This solution relies on the
load_state_dict
override in theDistributedModelParallel
to adjust the way model modules restore their state - the same override thatShardedModule
relies on. It means that using PipelinedPreproc outsideDistributedModelParallel
might cause the model to fail restoring from a checkpoint. However, similar toShardedModule
, PipelinedPreproc is not supposed to be directly used on the models, and be injected as part of the model rewrite for semi-sync pipeline. TL;DR: it should not happen, unless someone is actively doing the wrong thing.Differential Revision: D64572844