ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
32.19k stars 5.48k forks source link

Feature Request: Support for Habana HPU Accelerators #37635

Open htang2012 opened 1 year ago

htang2012 commented 1 year ago

Description

Hello Ray maintainers and community,

we've been using Ray for our works and find it to be a valuable tool for scalable and distributed machine learning. I believe it would be beneficial to many users if support for Habana HPU accelerators were added.

Habana Labs, an Intel company, designs and manufactures specialized hardware accelerators for artificial intelligence workloads, known as Habana Processing Units (HPUs). Habana's accelerators are becoming more prevalent in data centers due to their high performance and efficiency. By adding support for these accelerators, Ray could tap into the growing community of researchers and engineers using Habana's hardware.

Support for HPUs could potentially be implemented in a similar way to how Ray currently supports other accelerators, such as CPUs, GPUs, and gloo, NCCL

Thank you for considering this feature request. I'm looking forward to any discussion on this matter.

Best regards,

Henry Tang

Use case

Same as GPU and NCCL backend for distributed deployment. we would expect to run on HPU and HCCL (habana distributed backend).

rkooo567 commented 12 months ago

https://github.com/ray-project/ray/pull/36493

we have an ongoing PR from intel under review to support XPU. Is this a different type of hardware?

rkooo567 commented 12 months ago

https://github.com/ray-project/ray/blob/master/python/ray/util/accelerators/accelerators.md

-> it could be helpful if you are interested in contribution

htang2012 commented 12 months ago

36493

we have an ongoing PR from intel under review to support XPU. Is this a different type of hardware?

Yes, Habana is a subdivision of Intel which was acquired in 2019 and its HPU and Intel GPU are different accelerators.