pytorch / torchrec

Pytorch domain library for recommendation systems
https://pytorch.org/torchrec/
BSD 3-Clause "New" or "Revised" License
1.95k stars 441 forks source link

Ensure state_dict fqn unchanged for pipelined postproc modules #2503

Closed che-sh closed 3 weeks ago

che-sh commented 1 month ago

Summary: To avoid issues with checkpointing and restoring, PipelinedPreproc should behave "transparently" to operations related to the model structure - specifically the ones that save or load state_dict. To do so, we want to preserve the original FQN of the preproc module inside the PipelinedPreproc class that wraps it. The relevant methods are:

1) named_modules 2) named_parameters 3) named_buffers 4) state_dict 5) load_state_dict

Potential limitation: This solution relies on the load_state_dict override in the DistributedModelParallel to adjust the way model modules restore their state - the same override that ShardedModule relies on. It means that using PipelinedPreproc outside DistributedModelParallel might cause the model to fail restoring from a checkpoint. However, similar to ShardedModule, PipelinedPreproc is not supposed to be directly used on the models, and be injected as part of the model rewrite for semi-sync pipeline. TL;DR: it should not happen, unless someone is actively doing the wrong thing.

Differential Revision: D64572844

facebook-github-bot commented 1 month ago

This pull request was exported from Phabricator. Differential Revision: D64572844

facebook-github-bot commented 1 month ago

This pull request was exported from Phabricator. Differential Revision: D64572844

facebook-github-bot commented 1 month ago

This pull request was exported from Phabricator. Differential Revision: D64572844

facebook-github-bot commented 1 month ago

This pull request was exported from Phabricator. Differential Revision: D64572844

facebook-github-bot commented 1 month ago

This pull request was exported from Phabricator. Differential Revision: D64572844

facebook-github-bot commented 1 month ago

This pull request was exported from Phabricator. Differential Revision: D64572844

facebook-github-bot commented 1 month ago

This pull request was exported from Phabricator. Differential Revision: D64572844

facebook-github-bot commented 1 month ago

This pull request was exported from Phabricator. Differential Revision: D64572844

facebook-github-bot commented 1 month ago

This pull request was exported from Phabricator. Differential Revision: D64572844

facebook-github-bot commented 1 month ago

This pull request was exported from Phabricator. Differential Revision: D64572844

facebook-github-bot commented 1 month ago

This pull request was exported from Phabricator. Differential Revision: D64572844

facebook-github-bot commented 1 month ago

This pull request was exported from Phabricator. Differential Revision: D64572844

facebook-github-bot commented 1 month ago

This pull request was exported from Phabricator. Differential Revision: D64572844

facebook-github-bot commented 1 month ago

This pull request was exported from Phabricator. Differential Revision: D64572844

facebook-github-bot commented 1 month ago

This pull request was exported from Phabricator. Differential Revision: D64572844

facebook-github-bot commented 1 month ago

This pull request was exported from Phabricator. Differential Revision: D64572844

facebook-github-bot commented 3 weeks ago

This pull request was exported from Phabricator. Differential Revision: D64572844

facebook-github-bot commented 3 weeks ago

This pull request was exported from Phabricator. Differential Revision: D64572844

facebook-github-bot commented 3 weeks ago

This pull request was exported from Phabricator. Differential Revision: D64572844

facebook-github-bot commented 3 weeks ago

This pull request was exported from Phabricator. Differential Revision: D64572844

facebook-github-bot commented 3 weeks ago

This pull request was exported from Phabricator. Differential Revision: D64572844