ray-llm support for ML Accelerators (Google's TPU, AWS Inferential & etc)

ray-project / ray-llm

RayLLM - LLMs on Ray

https://aviary.anyscale.com

Apache License 2.0

1.22k stars 87 forks source link

ray-llm support for ML Accelerators (Google's TPU, AWS Inferential & etc) #60

Closed sudujr closed 11 months ago

sudujr commented 11 months ago

HI all,

I would love to know if in future will ray-llm support serving llm's on ML accelerators. With Vllms and other optimizations its possible to achieve high throughput and low latency. It would be great if we can use ray-llm's on ML accelerators as well.

richardliaw commented 11 months ago

good question - in the future we will indeed support TPU and Inferentia; but for now this is probably not going to happen in the next 3 months. I'll close this issue for now and we can discuss more when that timeframe arises!