huggingface / accelerate

🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
https://huggingface.co/docs/accelerate
Apache License 2.0
7.8k stars 947 forks source link

Feature request: FSDP for TPUs #422

Open OhadRubin opened 2 years ago

OhadRubin commented 2 years ago

A recent contribution to the pytorch_xla repo allows using FSDP in PyTorch XLA for sharding Module parameters across data-parallel workers. https://github.com/pytorch/xla/pull/3431 Some motivation behind this: It may be possible perform inference with OPT 30B on Google Colab without needing a Pro subscription, which I think many people will appreciate. What will be needed to add it to accelerate?

muellerzr commented 2 years ago

Once the next release of PyTorch XLA is out, we'll start taking a look at this

Vatshank commented 1 year ago

Hey @muellerzr, is there ongoing work for adding XLA support to FSDP? We, on the AWS SageMaker training compiler side, have started looking into XLA-FSDP and might be able to contribute to adding such support to accelerate.

muellerzr commented 1 year ago

@Vatshank not yet! It's the next thing on my list to get to after TPU pod support, so would love the help if you guys can! 🙏

Vatshank commented 1 year ago

Okay cool @muellerzr! Although our focus is on GPUs, I am sure there will be significant overlap in the code for adding support for either device type.

What do you think would be a good way to discuss some of these implementation details? If you guys have a shared Slack group for development, for instance. Also happy to continue to bug you on GitHub, if that's preferred :)

muellerzr commented 1 year ago

@Vatshank this gh issue should be fine!

JackCaoG commented 1 year ago

@AlexWertheim With your recent pr can we call this request done?

AlexWertheim commented 1 year ago

@AlexWertheim With your recent pr can we call this request done?

Yeah, I think so. For reference, the PR in question can be seen here. @muellerzr can say better than I can whether this fulfills all requirements where accelerate is concerned.