PSA: Here's how to get it working with Mistral

runpod-workers / worker-vllm

The RunPod worker template for serving our large language model endpoints. Powered by vLLM.

MIT License

238 stars 96 forks source link

Closed Palmik closed 11 months ago

Palmik commented 12 months ago

Update vllm dependency to latest version, and comment out max_num_batched_tokens in handler.

alpayariyak commented 11 months ago

Thank you @Palmik