skypilot-org / skypilot

SkyPilot: Run LLMs, AI, and Batch jobs on any cloud. Get maximum savings, highest GPU availability, and managed execution—all with a simple interface.
https://skypilot.readthedocs.io
Apache License 2.0
6.16k stars 424 forks source link

[Example] Distributed training example with accelerate #2781

Open Michaelvll opened 7 months ago

Michaelvll commented 7 months ago

A user requested for an example of running distributed training with accelerate.

github-actions[bot] commented 3 months ago

This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 10 days.

github-actions[bot] commented 3 months ago

This issue was closed because it has been stalled for 10 days with no activity.