Closed Permafacture closed 1 month ago
Update I installed vLLM 0.5.4 locally through pip and did not have this issue. It's specific to the runpod worker
Seeing this too
Same issue in vLLM 0.5.3 (runpod/worker-v1-vllm:stable-cuda12.1.0)
@Permafacture thanks for reporting this problem.
We are working on resolving this, I will keep you updated!
Is the issue resolved ?
Same issue in vLLM 0.5.4 on runpod. Any news about this?
@ericflo @naaviii12345 @jamorell @Permafacture @Juhong-Namgung @prashantjoshi22
We just released a bug fix: runpod/worker-v1-vllm:v1.3.1dev-cuda12.1.0
. Can you please check if this is working for you?
How do I try this on Runpod? I've only used the quick deploy for the vllm worker
This has been fixed.
Thanks team :-)
On Thu, 19 Sept 2024 at 03:48, Marut Pandya @.***> wrote:
This has been fixed.
— Reply to this email directly, view it on GitHub https://github.com/runpod-workers/worker-vllm/issues/104#issuecomment-2359501394, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADIQDET2OSA4ZO6OMGBV2H3ZXH34ZAVCNFSM6AAAAABNAK7LBGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNJZGUYDCMZZGQ . You are receiving this because you were mentioned.Message ID: @.***>
When trying to use the completions endpoint (rather than chat_completions) on a vLLM runpod serverless instance I get a server error. This happens with all models that I've tried. The chat_completions endpoint works as expected.
This example from the vLLM quick start shows the issue
On the client side I get a 500 Error response. On the Server I can see the error is
'NoneType' object has no attribute 'headers'
This is using the most recent vLLM 0.5.4