microsoft / vscode-ai-toolkit

MIT License
896 stars 41 forks source link

Issue with POSTMAN API Calling #80

Closed shreyan1999 closed 2 months ago

shreyan1999 commented 3 months ago

Hello,

I have tried to make a simple POST call using the URL "http://127.0.0.1:5272/v1/chat/completions" , and the body as follows,

{ "model": "Phi-3-mini-128k-cuda-int4-onnx", "messages": [ { "role": "user", "content": "Hi" } ], "temperature": 0.7, "top_p": 1, "top_k": 10, "max_tokens": 100, "stream": true }

When I make the API calling, the response body is being shown as follows,

data: { "id": "chat.id.16", "created": 1718722897, "choices": [ { "message": { "role": "assistant", "content": " Hello", "tool_calls": [] }, "logprobs": null, "index": 0, "finish_reason": null, "delta": { "role": "assistant", "content": " Hello", "tool_calls": [] } } ], "prompt_filter_results": [], "usage": null }

data: { "id": "chat.id.17", "created": 1718722897, "choices": [ { "message": { "role": "assistant", "content": "!", "tool_calls": [] }, "logprobs": null, "index": 1, "finish_reason": null, "delta": { "role": "assistant", "content": "!", "tool_calls": [] } } ], "prompt_filter_results": [], "usage": null }

data: { "id": "chat.id.18", "created": 1718722897, "choices": [ { "message": { "role": "assistant", "content": " How", "tool_calls": [] }, "logprobs": null, "index": 2, "finish_reason": null, "delta": { "role": "assistant", "content": " How", "tool_calls": [] } } ], "prompt_filter_results": [], "usage": null }

data: { "id": "chat.id.19", "created": 1718722897, "choices": [ { "message": { "role": "assistant", "content": " can", "tool_calls": [] }, "logprobs": null, "index": 3, "finish_reason": null, "delta": { "role": "assistant", "content": " can", "tool_calls": [] } } ], "prompt_filter_results": [], "usage": null }

data: { "id": "chat.id.20", "created": 1718722897, "choices": [ { "message": { "role": "assistant", "content": " I", "tool_calls": [] }, "logprobs": null, "index": 4, "finish_reason": null, "delta": { "role": "assistant", "content": " I", "tool_calls": [] } } ], "prompt_filter_results": [], "usage": null }

data: { "id": "chat.id.21", "created": 1718722897, "choices": [ { "message": { "role": "assistant", "content": " assist", "tool_calls": [] }, "logprobs": null, "index": 5, "finish_reason": null, "delta": { "role": "assistant", "content": " assist", "tool_calls": [] } } ], "prompt_filter_results": [], "usage": null }

data: { "id": "chat.id.22", "created": 1718722897, "choices": [ { "message": { "role": "assistant", "content": " you", "tool_calls": [] }, "logprobs": null, "index": 6, "finish_reason": null, "delta": { "role": "assistant", "content": " you", "tool_calls": [] } } ], "prompt_filter_results": [], "usage": null }

data: { "id": "chat.id.23", "created": 1718722897, "choices": [ { "message": { "role": "assistant", "content": " today", "tool_calls": [] }, "logprobs": null, "index": 7, "finish_reason": null, "delta": { "role": "assistant", "content": " today", "tool_calls": [] } } ], "prompt_filter_results": [], "usage": null }

data: { "id": "chat.id.24", "created": 1718722897, "choices": [ { "message": { "role": "assistant", "content": "?", "tool_calls": [] }, "logprobs": null, "index": 8, "finish_reason": null, "delta": { "role": "assistant", "content": "?", "tool_calls": [] } } ], "prompt_filter_results": [], "usage": null }

data: { "id": "chat.id.25", "created": 1718722897, "choices": [ { "message": { "role": "assistant", "content": "", "tool_calls": [] }, "logprobs": null, "index": 9, "finish_reason": "stop", "delta": { "role": "assistant", "content": "", "tool_calls": [] } } ], "prompt_filter_results": [], "usage": null }

data: [DONE ]

The response is split into multiple subclasses for every word. However when I remove the following section from the body " "temperature": 0.7, "top_p": 1, "top_k": 10, "max_tokens": 100, "stream": true"

The response is coming twice without the words being divided into subclasses.

Tagging @leestott

leestott commented 3 months ago

Hi replicated this issue

Postman post method http://127.0.0.1:5272/v1/chat/completions

JSON { "model": "Phi-3-mini-4k-directml-int4-awq-block-128-onnx", "messages": [ { "role": "user", "content": "Hi" } ], "temperature": 0.7, "top_p": 1, "top_k": 10, "max_tokens": 100, "stream": true }

The issue is the response is being brought back in sub classes not a single line

leestott commented 3 months ago

Additional update

If you remove the temp-stream you get this output which is duplicated.

{ "id": "chat.id.14", "created": 1718722757, "choices": [ { "message": { "role": "assistant", "content": " Hello! How can I assist you today?", "tool_calls": [] }, "logprobs": null, "index": 0, "finish_reason": "stop", "delta": { "role": "assistant", "content": " Hello! How can I assist you today?", "tool_calls": [] } } ], "prompt_filter_results": [], "usage": null }

swatDong commented 3 months ago

@shreyan1999 @leestott , the local API is uses the OpenAI Chat Completion contract, which supports streaming.

So if using "stream": true, the response is split into chunks and please use the delta field. If using "stream": false or removing "stream", the response is the whole message and please use the message field.

shreyan1999 commented 3 months ago

{ "id": "chat.id.1", "created": 1719684353, "choices": [ { "message": { "role": "assistant", "content": " Hello! How can I assist you today?", "tool_calls": [] }, "logprobs": null, "index": 0, "finish_reason": "stop", "delta": { "role": "assistant", "content": " Hello! How can I assist you today?", "tool_calls": [] } } ], "prompt_filter_results": [], "usage": null }

Thank you for the resolution @swatDong .Yes the issue was resolve upon setting the parameter to False. The response is still coming out twice, once under the choices->message-> content section and second time in the delta->content section. As we can have the first occurrence for the content message extraction. What is the second occurrence exactly notify?

swatDong commented 3 months ago

@shreyan1999 - when using stream: false, please just ignore the delta section. The both occurrences are exactly the same.

I'll check internally for the duplicated content issue.

swatDong commented 2 months ago

No activity for 2 weeks, closing this issue. Feel free to comment or reopen, and we will re-investigate.