Closed WillReynolds5 closed 11 months ago
I have added the implementation and opened a pr: https://github.com/runpod-workers/worker-vllm/pull/16 Only tested on two models, but works for me. Let's wait for the maintainers response.
Thank you!!
Quantization now supported on the main branch. Thanks!
I am wondering if there is a way to load a model with quantization? I can load my model with awq quantization with vllm api_server but I am am not seeing support for serverless endpoints.
Thanks!