skypilot-org / skypilot

SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
https://skypilot.readthedocs.io
Apache License 2.0
6.68k stars 493 forks source link

[GPU] Add support for AMD GPUs #2648

Open Michaelvll opened 1 year ago

Michaelvll commented 1 year ago

We should consider adding support for AMD GPUs, which have been tested to be efficient for ML workloads.

References: https://www.amd.com/en/technologies/deep-machine-learning https://www.lamini.ai/blog/lamini-amd-paving-the-road-to-gpu-rich-enterprise-llms https://blog.mlc.ai/2023/08/09/Making-AMD-GPUs-competitive-for-LLM-inference https://www.mosaicml.com/blog/amd-mi250

github-actions[bot] commented 8 months ago

This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 10 days.

github-actions[bot] commented 8 months ago

This issue was closed because it has been stalled for 10 days with no activity.

deke997 commented 7 months ago

We are also interested in using SkyPilot on AMD GPUs.

What changes are necessary to support this?

romilbhardwaj commented 7 months ago

Hi @deke997 - what kind of cluster do you have? Does it run any orchestration layer, such as k8s?

We have a PoC PR for AMD on Kubernetes clusters here - https://github.com/skypilot-org/skypilot/pull/3209

Let us know what you think!