YQ-Wang commented 10 months ago

Description:

This pull request addresses the issue #81 raised regarding the absence of examples for serving quantized models in Ray-LLM. Given the significant hardware requirements of the Llama 2 model, introducing quantized model support can make the model more accessible to users with limited resources.

Changes:

AWQ Quantized Model Example: An example model config demonstrating how to serve the AWQ quantized Llama 2 model. README Update: Add instruction for quantization configuration.

Closes https://github.com/ray-project/ray-llm/issues/81

YQ-Wang commented 10 months ago

Could you add a Serve config for the AWQ model, so users can run it with serve run? You should be able to pattern match the other config files.

Done

shrekris-anyscale commented 10 months ago

Thanks for addressing my comments! I'll run the configs myself later this week, and then I'll approve this change.

richardliaw commented 9 months ago

cc @shrekris-anyscale to followup on this!

shrekris-anyscale commented 9 months ago

@Yard1 confirmed that the config runs successfully. I'll approve and merge this change. Thanks again for your contribution @YQ-Wang!

ray-project / ray-llm

Add AWQ Quantized Llama 2 70B Model Config & Update README #82

Description:

Changes: