Closed thepowerfuldeez closed 1 month ago
@thepowerfuldeez is there a reason why you're trying to dynamically change the state dict type? We should only be relying on how you instantiate the FSDPPlugin here, which is why you found this change.
(And also manually calling set_state_dict_type
?)
@muellerzr I remember cpu offload didn't work with sharded state dict, so it's changed dynamically in the code. I tried removing this part of code and changing from the config, but it doesn't work either.
If we're doing it this way, you would need to first override it manually on the plugin, then call set_state_dict_type()
. But some code is really needed for me to wrap around this workflow.
System Info
Information
Tasks
no_trainer
script in theexamples
folder of thetransformers
repo (such asrun_no_trainer_glue.py
)Reproduction
trainer.accelerator.state.fsdp_plugin.set_state_dict_type("FULL_STATE_DICT")
(trying to set FULL_STATE_DICT from fsdp config doesn't work either)
Expected behavior
trainer.accelerator.state.fsdp_plugin.set_state_dict_type("FULL_STATE_DICT")
works