Add AWQ and SqueezeLLM quantization configs

ray-project / ray-llm

RayLLM - LLMs on Ray

https://aviary.anyscale.com

Apache License 2.0

1.22k stars 87 forks source link

Add AWQ and SqueezeLLM quantization configs #95

Closed uvikas closed 9 months ago

uvikas commented 9 months ago

This pull request adds configs for AWQ and SqueezeLLM 4-bit weight-only quantization methods for Llama 2 models (7B, 13B, 70B). Quantization allows users to deploy models with cheaper hardware requirements and lower inference latency.