OpenAI server does not work with stream: true

I'm running all controller, model_worker and openai-server together with CodeLlama-7b-Instruct-hf model. When I send the request with stream: true it fails to answer. The request is the following.

{
  "model": "CodeLlama-7b-Instruct-hf",
  "messages": [
    {
      "role": "user",
      "content": "What are pros and cons of Python?"
    }
  ],
  "max_tokens": 1024,
  "stream": true
}

The log looks like the following:

2024-01-22 20:55:58 | INFO | stdout | INFO:     127.0.0.1:50668 - "POST /v1/chat/completions HTTP/1.1" 200 OK
2024-01-22 20:55:59 | INFO | httpx | HTTP Request: POST http://localhost:21002/worker_generate_stream "HTTP/1.1 407 authenticationrequired"

Parameters for all three components are the following (I use Windows):

py -m fastchat.serve.controller

py -m fastchat.serve.model_worker --model-path CodeLlama-7b-Instruct-hf

py -m fastchat.serve.openai_api_server --host localhost --port 8000

If I change stream to false is works as expected. Is there any workarounds for this? I cannot change request parameters.

lm-sys / FastChat

OpenAI server does not work with stream: true #2944