skypilot-org / skypilot

SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
https://skypilot.readthedocs.io
Apache License 2.0
6.83k stars 515 forks source link

[GPU] Add support for AMD GPUs #2648

Closed Michaelvll closed 2 weeks ago

Michaelvll commented 1 year ago

We should consider adding support for AMD GPUs, which have been tested to be efficient for ML workloads.

References: https://www.amd.com/en/technologies/deep-machine-learning https://www.lamini.ai/blog/lamini-amd-paving-the-road-to-gpu-rich-enterprise-llms https://blog.mlc.ai/2023/08/09/Making-AMD-GPUs-competitive-for-LLM-inference https://www.mosaicml.com/blog/amd-mi250

github-actions[bot] commented 10 months ago

This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 10 days.

github-actions[bot] commented 9 months ago

This issue was closed because it has been stalled for 10 days with no activity.

deke997 commented 9 months ago

We are also interested in using SkyPilot on AMD GPUs.

What changes are necessary to support this?

romilbhardwaj commented 9 months ago

Hi @deke997 - what kind of cluster do you have? Does it run any orchestration layer, such as k8s?

We have a PoC PR for AMD on Kubernetes clusters here - https://github.com/skypilot-org/skypilot/pull/3209

Let us know what you think!

github-actions[bot] commented 3 weeks ago

This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 10 days.

github-actions[bot] commented 2 weeks ago

This issue was closed because it has been stalled for 10 days with no activity.