Closed shreyan1999 closed 4 months ago
Hi replicated this issue
Postman post method http://127.0.0.1:5272/v1/chat/completions
JSON { "model": "Phi-3-mini-4k-directml-int4-awq-block-128-onnx", "messages": [ { "role": "user", "content": "Hi" } ], "temperature": 0.7, "top_p": 1, "top_k": 10, "max_tokens": 100, "stream": true }
The issue is the response is being brought back in sub classes not a single line
Additional update
If you remove the temp-stream you get this output which is duplicated.
{ "id": "chat.id.14", "created": 1718722757, "choices": [ { "message": { "role": "assistant", "content": " Hello! How can I assist you today?", "tool_calls": [] }, "logprobs": null, "index": 0, "finish_reason": "stop", "delta": { "role": "assistant", "content": " Hello! How can I assist you today?", "tool_calls": [] } } ], "prompt_filter_results": [], "usage": null }
@shreyan1999 @leestott , the local API is uses the OpenAI Chat Completion contract, which supports streaming
.
So if using "stream": true
, the response is split into chunks and please use the delta
field.
If using "stream": false
or removing "stream"
, the response is the whole message and please use the message
field.
{ "id": "chat.id.1", "created": 1719684353, "choices": [ { "message": { "role": "assistant", "content": " Hello! How can I assist you today?", "tool_calls": [] }, "logprobs": null, "index": 0, "finish_reason": "stop", "delta": { "role": "assistant", "content": " Hello! How can I assist you today?", "tool_calls": [] } } ], "prompt_filter_results": [], "usage": null }
Thank you for the resolution @swatDong .Yes the issue was resolve upon setting the parameter to False. The response is still coming out twice, once under the choices->message-> content section and second time in the delta->content section. As we can have the first occurrence for the content message extraction. What is the second occurrence exactly notify?
@shreyan1999 - when using stream: false
, please just ignore the delta
section. The both occurrences are exactly the same.
I'll check internally for the duplicated content issue.
No activity for 2 weeks, closing this issue. Feel free to comment or reopen, and we will re-investigate.
Hello,
I have tried to make a simple POST call using the URL "http://127.0.0.1:5272/v1/chat/completions" , and the body as follows,
{ "model": "Phi-3-mini-128k-cuda-int4-onnx", "messages": [ { "role": "user", "content": "Hi" } ], "temperature": 0.7, "top_p": 1, "top_k": 10, "max_tokens": 100, "stream": true }
When I make the API calling, the response body is being shown as follows,
data: { "id": "chat.id.16", "created": 1718722897, "choices": [ { "message": { "role": "assistant", "content": " Hello", "tool_calls": [] }, "logprobs": null, "index": 0, "finish_reason": null, "delta": { "role": "assistant", "content": " Hello", "tool_calls": [] } } ], "prompt_filter_results": [], "usage": null }
data: { "id": "chat.id.17", "created": 1718722897, "choices": [ { "message": { "role": "assistant", "content": "!", "tool_calls": [] }, "logprobs": null, "index": 1, "finish_reason": null, "delta": { "role": "assistant", "content": "!", "tool_calls": [] } } ], "prompt_filter_results": [], "usage": null }
data: { "id": "chat.id.18", "created": 1718722897, "choices": [ { "message": { "role": "assistant", "content": " How", "tool_calls": [] }, "logprobs": null, "index": 2, "finish_reason": null, "delta": { "role": "assistant", "content": " How", "tool_calls": [] } } ], "prompt_filter_results": [], "usage": null }
data: { "id": "chat.id.19", "created": 1718722897, "choices": [ { "message": { "role": "assistant", "content": " can", "tool_calls": [] }, "logprobs": null, "index": 3, "finish_reason": null, "delta": { "role": "assistant", "content": " can", "tool_calls": [] } } ], "prompt_filter_results": [], "usage": null }
data: { "id": "chat.id.20", "created": 1718722897, "choices": [ { "message": { "role": "assistant", "content": " I", "tool_calls": [] }, "logprobs": null, "index": 4, "finish_reason": null, "delta": { "role": "assistant", "content": " I", "tool_calls": [] } } ], "prompt_filter_results": [], "usage": null }
data: { "id": "chat.id.21", "created": 1718722897, "choices": [ { "message": { "role": "assistant", "content": " assist", "tool_calls": [] }, "logprobs": null, "index": 5, "finish_reason": null, "delta": { "role": "assistant", "content": " assist", "tool_calls": [] } } ], "prompt_filter_results": [], "usage": null }
data: { "id": "chat.id.22", "created": 1718722897, "choices": [ { "message": { "role": "assistant", "content": " you", "tool_calls": [] }, "logprobs": null, "index": 6, "finish_reason": null, "delta": { "role": "assistant", "content": " you", "tool_calls": [] } } ], "prompt_filter_results": [], "usage": null }
data: { "id": "chat.id.23", "created": 1718722897, "choices": [ { "message": { "role": "assistant", "content": " today", "tool_calls": [] }, "logprobs": null, "index": 7, "finish_reason": null, "delta": { "role": "assistant", "content": " today", "tool_calls": [] } } ], "prompt_filter_results": [], "usage": null }
data: { "id": "chat.id.24", "created": 1718722897, "choices": [ { "message": { "role": "assistant", "content": "?", "tool_calls": [] }, "logprobs": null, "index": 8, "finish_reason": null, "delta": { "role": "assistant", "content": "?", "tool_calls": [] } } ], "prompt_filter_results": [], "usage": null }
data: { "id": "chat.id.25", "created": 1718722897, "choices": [ { "message": { "role": "assistant", "content": "", "tool_calls": [] }, "logprobs": null, "index": 9, "finish_reason": "stop", "delta": { "role": "assistant", "content": "", "tool_calls": [] } } ], "prompt_filter_results": [], "usage": null }
data: [DONE ]
The response is split into multiple subclasses for every word. However when I remove the following section from the body " "temperature": 0.7, "top_p": 1, "top_k": 10, "max_tokens": 100, "stream": true"
The response is coming twice without the words being divided into subclasses.
Tagging @leestott