ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
32.94k stars 5.58k forks source link

[CI] Add neuron_cores drivers support on ray-ml docker image #39221

Open chappidim opened 1 year ago

chappidim commented 1 year ago

Description

Currently, ray-ml has gpu label which installs NVIDIA and other crucial GPU drivers. Now that we support NeuronCore and other accelerators, it would benefits users to use a ray-ml image with neuron-core/EFA drivers installed. This would cut-down the autoscaler or node-provision time from minutes to seconds.

can-anyscale commented 11 months ago

CC: @krfricke , @matthewdeng , look like a decision for the ml team to make?

anyscalesam commented 11 months ago

@matthewdeng can you please set priority and advise on timeline?

matthewdeng commented 11 months ago

This is a P2 on our side, but @chappidim if you are able to work on this we are happy to help.