pytorch / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration
https://pytorch.org
Other
82.89k stars 22.35k forks source link

[FSDP] `ignored_modules` follow-ups #77141

Open awgu opened 2 years ago

awgu commented 2 years ago

This issue is to track a few follow-ups regarding ignored_modules.

  1. Users may want to ignore specific parameters or buffers within a module. How should we modify the API to accommodate this?
  2. What should happen if a user passes a module into ignored_modules that contains a submodule that is an FSDP instance?
    • The current implementation ignores it without warning, but alternatives are to issue a warning or to error.
  3. How should ignored_modules interact with shared parameters and buffers? For example, what should happen if a parameter is in a module in ignored_modules but also in a module that is not in ignored_modules? What if that latter module is an FSDP instance and already flattened its parameters?

cc @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @SciPioneer @H-Huang

awgu commented 2 years ago
  1. We should add tests (and possibly fix the implementation) to handle the case where ignored_modules violates the superset property, i.e. a parent FSDP instance's ignored_modules is not a superset of the child FSDP instance.