runpod-workers / worker-vllm

The RunPod worker template for serving our large language model endpoints. Powered by vLLM.
MIT License
213 stars 81 forks source link

v0.3.0: OpenAI Compatibility, Dynamic Stream Batching, Refactor, Error Responses, more #47

Closed alpayariyak closed 6 months ago