I'm running all controller, model_worker and openai-server together with CodeLlama-7b-Instruct-hf model. When I send the request with stream: true it fails to answer. The request is the following.
{
"model": "CodeLlama-7b-Instruct-hf",
"messages": [
{
"role": "user",
"content": "What are pros and cons of Python?"
}
],
"max_tokens": 1024,
"stream": true
}
The log looks like the following:
2024-01-22 20:55:58 | INFO | stdout | INFO: 127.0.0.1:50668 - "POST /v1/chat/completions HTTP/1.1" 200 OK
2024-01-22 20:55:59 | INFO | httpx | HTTP Request: POST http://localhost:21002/worker_generate_stream "HTTP/1.1 407 authenticationrequired"
Parameters for all three components are the following (I use Windows):
I'm running all controller, model_worker and openai-server together with CodeLlama-7b-Instruct-hf model. When I send the request with
stream: true
it fails to answer. The request is the following.The log looks like the following:
Parameters for all three components are the following (I use Windows):
If I change stream to
false
is works as expected. Is there any workarounds for this? I cannot change request parameters.