Open ymuichiro opened 5 days ago
Hi @ymuichiro, thank you for your contribution!
If you read the comments made to this particular _inner_get_streaming_chat_message_contents
method, you will see the reason why stream_option
is not allowed with Azure OpenAI.
Did you observe a different behavior with Azure OpenAI?
@TaoChenOSU Of course. I have verified this for each API version. In my environment, regardless of which API version I choose, no errors occur, and the token usage for the stream is returned.
Am I misunderstanding something? It does indeed feel odd that it works even with older API versions.
↓ success versions and sample code
2024-10-01-preview
2024-09-01-preview
2024-07-01-preview
2024-10-21
2024-06-01
The following was created directly in the shell to prevent any misunderstandings due to other causes, but even when using Semantic Kernel, the same error could not be reproduced.
payload="{\
\"messages\": [\
{\
\"role\": \"user\",\
\"content\": [\
{\
\"type\": \"text\",\
\"text\": \"hi.\"\
}\
]\
}\
],\
\"temperature\": 0.7,\
\"top_p\": 0.95,\
\"stream\": true,\
\"stream_options\": { \"include_usage\": true },\
\"max_tokens\": 10\
}"
curl "https://${***********************}.openai.azure.com/openai/deployments/gpt-4o-mini/chat/completions?api-version=2024-10-21" \
-H "Content-Type: application/json" \
-H "api-key: **************************" \
-d "$payload"
async def stream_sample() -> None:
kernel = sk.Kernel()
service_id: str = "dummy"
kernel.add_service(
AzureChatCompletion(
service_id=service_id,
deployment_name=AZURE_OPENAI_COMPLETION_DEPLOYMENT_NAME,
endpoint=AZURE_OPENAI_COMPLETION_ENDPOINT,
api_key=AZURE_OPENAI_COMPLETION_API_KEY,
api_version="2024-06-01",
)
)
service = kernel.get_service(service_id=service_id)
settings = service.get_prompt_execution_settings_class()(service_id=service_id)
if isinstance(settings, AzureChatPromptExecutionSettings):
settings.extra_body = {
"stream_options": {
"include_usage": True,
}
}
history = ChatHistory()
history.add_user_message("hello")
async for chunk in service.get_streaming_chat_message_contents(
chat_history=history,
settings=settings,
kernel=kernel,
arguments=KernelArguments(settings=settings),
):
print(chunk)
Sorry, I used the wrong account but it's the same person.
Hi @ymuichiro,
I just verified. Seems like they have resolved the issue. Could you remove the override of _inner_get_streaming_chat_message_contents
in AzureChatCompletion
? The default implementation is already in OpenAIChatCompletionBase
which handles streaming tokens correctly.
hi @TaoChenOSU
sure, is this ok? I have confirmed that it works.
hi @TaoChenOSU
sure, is this ok? I have confirmed that it works.
Yes, this is right. Just one minor comment and we are good!
We have a unit test failure because of the change. Could you fix that too?
@TaoChenOSU
Sorry. I missed it. I found out that stream_options is forced, so I added the argument to the test code. I verified that the test runs successfully locally.
Motivation and Context
issue: https://github.com/microsoft/semantic-kernel/issues/9751
This pull request addresses a bug where setting
stream_options.include_usage
toTrue
does not return token usage, resulting inNone
for theusage
field.The issue occurs when using Azure OpenAI's GPT-4o and GPT-4omini models. In particular, if the last chunk of the response has an empty
choices
list, the chunk is skipped entirely, and the token usage is not processed correctly.In the Azure OpenAI implementation, if
usage
information is included, the chunk should be processed appropriately. However, the current code skips processing whenchoices
is empty. This pull request fixes this behavior so that the chunk is processed whenusage
is present, even ifchoices
is empty.Description
This fix includes the following changes:
azure_chat_completion.py
to ensure that chunks with emptychoices
are not skipped ifusage
information is present.if len(chunk.choices) == 0:
was updated to allow chunks withusage
data to be processed correctly.With these changes, setting
stream_options.include_usage
toTrue
will correctly return token usage data, even for chunks where thechoices
list is empty.Contribution Checklist