huggingface / accelerate

🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
https://huggingface.co/docs/accelerate
Apache License 2.0
7.8k stars 946 forks source link

fsdp.md - needs updating to accommodate diffusers models? #3089

Open christopher-beckham opened 1 month ago

christopher-beckham commented 1 month ago

Hi,

In the FSDP docs it says:

When using transformers save_pretrained, pass state_dict=accelerator.get_state_dict(model) to save the model state dict. Below is an example:

  unwrapped_model.save_pretrained(
      args.output_dir,
      is_main_process=accelerator.is_main_process,
      save_function=accelerator.save,
+     state_dict=accelerator.get_state_dict(model),
)

In diffusers (I can't speak for transformers), for anything which implements the ModelMixin class the method save_pretrained actually doesn't support passing in a custom state dict. While save_pretrained does take **kwargs it specifically is for kwargs to be passed into push_to_hub:

https://github.com/huggingface/diffusers/blob/8cdcdd9e32925200ce5e1cf410fe14a774f3c3a6/src/diffusers/models/modeling_utils.py#L266-L275

It is probably worth modifying the readme to say that in the case of diffusers you might be better off doing something like:

from safetensors.torch import save_file
model.save_config(save_dir) # save_config is from ConfigMixin
save_file(accelerator.get_state_dict(model), save_dir)

i.e., we have to be a bit hacky and save the state dict ourselves as long as the config file. (There may be a more optimal solution but I'm not a wizard at this.)

Any thoughts? Thanks.

github-actions[bot] commented 2 days ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.