lm-sys / FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Apache License 2.0
36.7k stars 4.53k forks source link

OpenAI server does not work with stream: true #2944

Closed ArchiDevil closed 9 months ago

ArchiDevil commented 9 months ago

I'm running all controller, model_worker and openai-server together with CodeLlama-7b-Instruct-hf model. When I send the request with stream: true it fails to answer. The request is the following.

{
  "model": "CodeLlama-7b-Instruct-hf",
  "messages": [
    {
      "role": "user",
      "content": "What are pros and cons of Python?"
    }
  ],
  "max_tokens": 1024,
  "stream": true
}

The log looks like the following:

2024-01-22 20:55:58 | INFO | stdout | INFO:     127.0.0.1:50668 - "POST /v1/chat/completions HTTP/1.1" 200 OK
2024-01-22 20:55:59 | INFO | httpx | HTTP Request: POST http://localhost:21002/worker_generate_stream "HTTP/1.1 407 authenticationrequired"

Parameters for all three components are the following (I use Windows):

py -m fastchat.serve.controller
py -m fastchat.serve.model_worker --model-path CodeLlama-7b-Instruct-hf
py -m fastchat.serve.openai_api_server --host localhost --port 8000

If I change stream to false is works as expected. Is there any workarounds for this? I cannot change request parameters.

nuocheng commented 5 months ago

I also encountered this problem, have you solved it?