runpod-workers / worker-vllm

The RunPod worker template for serving our large language model endpoints. Powered by vLLM.
MIT License
238 stars 96 forks source link

PSA: Here's how to get it working with Mistral #15

Closed Palmik closed 11 months ago

Palmik commented 12 months ago

Update vllm dependency to latest version, and comment out max_num_batched_tokens in handler.

alpayariyak commented 11 months ago

Thank you @Palmik