ray-project / ray-llm

RayLLM - LLMs on Ray
https://aviary.anyscale.com
Apache License 2.0
1.22k stars 87 forks source link

Add AWQ and SqueezeLLM quantization configs #95

Closed uvikas closed 9 months ago

uvikas commented 9 months ago

This pull request adds configs for AWQ and SqueezeLLM 4-bit weight-only quantization methods for Llama 2 models (7B, 13B, 70B). Quantization allows users to deploy models with cheaper hardware requirements and lower inference latency.