runpod-workers / worker-vllm

The RunPod worker template for serving our large language model endpoints. Powered by vLLM.
MIT License
222 stars 86 forks source link

Bitsandbytes support #99

Open ilyalasy opened 4 weeks ago

ilyalasy commented 4 weeks ago

Hi there! vllm supports bitsandbytes quantization, but there is no bitsandbytes dependency in requirements.txt. Is there any plans to fix that?