ray-project / ray-llm

RayLLM - LLMs on Ray
https://aviary.anyscale.com
Apache License 2.0
1.22k stars 87 forks source link

No example for quantized model #81

Closed jinnig closed 9 months ago

jinnig commented 10 months ago

Currently, the Llama 2 model requires a significant number of GPUs to serve. Would it be possible to add support for quantized models to Ray-LLM? This would allow us to reduce the hardware requirements for serving Llama 2 models, making them more accessible to a wider range of users.

I have not been able to find any examples of quantized model serving for Ray-LLM, so it is not clear if this is currently supported.

YQ-Wang commented 10 months ago

Please check https://github.com/ray-project/ray-llm/pull/82

jinnig commented 10 months ago

@YQ-Wang Thank you so much! The config indeed was helpful in serving the AWQ model:)