runpod-workers / worker-vllm

The RunPod worker template for serving our large language model endpoints. Powered by vLLM.
MIT License
220 stars 85 forks source link

Update vllm version to 0.2.1 #11

Closed kenny019 closed 10 months ago

kenny019 commented 11 months ago

feat: add quantization environment option fix: set lower default max_num_batched_tokens

Resolves Update for vllm 0.2.0 #9

alpayariyak commented 10 months ago

Thank you for your work @kenny019, main branch is now running vllm 0.2.1.post1. Commit: 4f792062aaea02c526ee906979925b447811ef48