Typing: when stream is completed, delta in ChatCompletionChunk from azure openai is None; should be ChoiceDelta

JensMadsen commented 2 weeks ago

Confirm this is an issue with the Python library and not an underlying OpenAI API

[X] This is an issue with the Python library

Describe the bug

When streaming from azure open ai API the delta of the choice is None. In the python open ai client v1.42.0 delta is type ChoiceDelta i.e. not None.

To Reproduce

Run this code in line with

    completion = await self._client.chat.completions.create(
            model=self.deployment.name,
            messages=cast(list[ChatCompletionMessageParam], messages),
            stream=True,
            temperature=temperature,
        )

    async for response_chunk in completion:
        ...

The types are: response_chunk: ChatCompletionChunk response_chunk.choices: list[Choice] response_chunk.choices[0].delta: ChoiceDelta

The response from azure open ai API returns delta=Nonewhen stream ends

Response example:

Choice(delta=None, finish_reason=None ...........)

Code snippets

No response

OS

linux, ubuntu 20.04

Python version

3.12.1

Library version

openai v 1.42.0

kristapratico commented 2 weeks ago

@JensMadsen could you share more information to help in reproducing this?

what is the model you are using?
which Azure OpenAI API version?
what kind of deployment - standard, global, provision-managed?

JensMadsen commented 2 weeks ago

@JensMadsen could you share more information to help in reproducing this?

what is the model you are using?

which Azure OpenAI API version?

what kind of deployment - standard, global, provision-managed?

@kristapratico I think I have identified what causes the incorrect types. I use the 2024-05-01-preview azure API version (to use the assistants api). When I switch back to 2023-05-15 it works as expected. I also see the type mismatch in e.g. API version 2024-06-01. I have not thoroughly tested with all versions i.e. see: https://learn.microsoft.com/en-us/azure/ai-services/openai/api-version-deprecation.

I use gpt-4o
deployment is standard

kristapratico commented 2 weeks ago

@JensMadsen thanks. Unfortunately, I'm still missing something to reproduce this. Could you share the region your resource resides in and/or the prompt that causes this?

edit: Do you by chance have a custom content filter applied to the deployment with asynchronous filtering enabled?

JensMadsen commented 2 weeks ago

@JensMadsen thanks. Unfortunately, I'm still missing something to reproduce this. Could you share the region your resource resides in and/or the prompt that causes this?

edit: Do you by chance have a custom content filter applied to the deployment with asynchronous filtering enabled?

Yes, of course.

Region: Sweden We have a content filter that I think i "custom" (see screenshot):

I see this with all prompts so far.

Again, using the older API 2023-05-15 results in responses aligned with the types in the open ai python client.

kristapratico commented 2 weeks ago

@JensMadsen got it. In your screenshot, it does look like the asynchronous content filter is enabled. With the async filter turned on, the Azure response is slightly altered to return information like content_filter_results and content_filter_offsets in the first and final streamed chunk (and omit sending delta):

data: {"id":"","object":"","created":0,"model":"","prompt_annotations":[{"prompt_index":0,"content_filter_results":{"hate":{"filtered":false,"severity":"safe"},"self_harm":{"filtered":false,"severity":"safe"},"sexual":{"filtered":false,"severity":"safe"},"violence":{"filtered":false,"severity":"safe"}}}],"choices":[],"usage":null} 

data: {"id":"chatcmpl-7rCNsVeZy0PGnX3H6jK8STps5nZUY","object":"chat.completion.chunk","created":1692913344,"model":"gpt-35-turbo","choices":[{"index":0,"finish_reason":null,"delta":{"role":"assistant"}}],"usage":null} 

data: {"id":"chatcmpl-7rCNsVeZy0PGnX3H6jK8STps5nZUY","object":"chat.completion.chunk","created":1692913344,"model":"gpt-35-turbo","choices":[{"index":0,"finish_reason":null,"delta":{"content":"Color"}}],"usage":null} 

data: {"id":"chatcmpl-7rCNsVeZy0PGnX3H6jK8STps5nZUY","object":"chat.completion.chunk","created":1692913344,"model":"gpt-35-turbo","choices":[{"index":0,"finish_reason":null,"delta":{"content":" is"}}],"usage":null} 

data: {"id":"chatcmpl-7rCNsVeZy0PGnX3H6jK8STps5nZUY","object":"chat.completion.chunk","created":1692913344,"model":"gpt-35-turbo","choices":[{"index":0,"finish_reason":null,"delta":{"content":" a"}}],"usage":null} 

... 

data: {"id":"","object":"","created":0,"model":"","choices":[{"index":0,"finish_reason":null,"content_filter_results":{"hate":{"filtered":false,"severity":"safe"},"self_harm":{"filtered":false,"severity":"safe"},"sexual":{"filtered":false,"severity":"safe"},"violence":{"filtered":false,"severity":"safe"}},"content_filter_offsets":{"check_offset":44,"start_offset":44,"end_offset":198}}],"usage":null} 

... 

data: {"id":"chatcmpl-7rCNsVeZy0PGnX3H6jK8STps5nZUY","object":"chat.completion.chunk","created":1692913344,"model":"gpt-35-turbo","choices":[{"index":0,"finish_reason":"stop","delta":{}}],"usage":null} 

data: {"id":"","object":"","created":0,"model":"","choices":[{"index":0,"finish_reason":null,"content_filter_results":{"hate":{"filtered":false,"severity":"safe"},"self_harm":{"filtered":false,"severity":"safe"},"sexual":{"filtered":false,"severity":"safe"},"violence":{"filtered":false,"severity":"safe"}},"content_filter_offsets":{"check_offset":506,"start_offset":44,"end_offset":571}}],"usage":null} 

data: [DONE]

Source: https://learn.microsoft.com/azure/ai-services/openai/concepts/content-filter?tabs=warning%2Cuser-prompt%2Cpython-new#sample-response-stream-passes-filters

I'm following up with the team to try to understand the reason for this difference. You won't see this with the older version (2023-05-15) since content filter annotations weren't added to the API until 2023-06-01-preview and later. It looks like the async filter is still in preview and could be subject to change, so at the moment I think it might be best to write code that is resilient to this API difference. You're absolutely right that the typing is wrong for Azure in this case, but I believe that this discrepancy lies more on the service than the SDK.

openai / openai-python