runpod-workers / worker-vllm

The RunPod worker template for serving our large language model endpoints. Powered by vLLM.
MIT License
220 stars 85 forks source link

Feat: AWQ quantisation support #16

Closed willsamu closed 10 months ago

willsamu commented 10 months ago

Adds support for AWQ quantisation by adding an additional build-arg QUANTIZATION.

Updated vLLM core dependency in order to support new feature.

Tested to work with TheBloke/airoboros-l2-7B-3.0-AWQ and TheBloke/Airoboros-L2-70B-3.1.2-AWQ.

alpayariyak commented 10 months ago

Great work @willsamu, thank you for your contribution!

alpayariyak commented 10 months ago

Commit: 4f792062aaea02c526ee906979925b447811ef48