After starting the streaming of the first chunk and issuing Cancel on the token, the stream is not cancelled and continues to give the remaining chunks.
To Reproduce
[!IMPORTANT]
This example uses the AzureOpenAIClient but the CancellationToken behavior is inherited from the OpenAIClient.
Failing xUnit Reproduction Code (should Pass)
[Fact]
public async Task ItCancellationWorksAsExpectedAfterFirstChunkSuccessful2Async()
{
// Arrange
using var streamText = new MemoryStream(Encoding.UTF8.GetBytes(
"""
data: {"id":"Eoo","object":"chat.completion.chunk","created":1711377846,"model":"gpt-4-0125-preview","system_fingerprint":"fp_a7daf7c51e","choices":[{"index":0,"delta":{"content":"Test chat streaming response"},"logprobs":null,"finish_reason":null}]}
data: {"id":"Eoo","object":"chat.completion.chunk","created":1711377846,"model":"gpt-4-0125-preview","system_fingerprint":"fp_a7daf7c51e","choices":[{"index":0,"delta":{},"logprobs":null,"finish_reason":"stop"}]}
data: [DONE]
"""
));
using var response = new HttpResponseMessage(HttpStatusCode.OK) { Content = new StreamContent(streamText) };
this._messageHandlerStub.ResponsesToReturn.Add(response);
var azureClient = new AzureOpenAIClient(new Uri("http://localhost"), "api-key", new AzureOpenAIClientOptions { Transport = new HttpClientPipelineTransport(this._httpClient) });
using var cancellationTokenSource = new CancellationTokenSource();
cancellationTokenSource.CancelAfter(1000);
var sut = azureClient.GetChatClient("mock-model");
// Act & Assert
var enumerator = sut.CompleteChatStreamingAsync(["Hello!"], cancellationToken: cancellationTokenSource.Token).GetAsyncEnumerator();
await enumerator.MoveNextAsync();
var firstChunk = enumerator.Current;
Assert.False(cancellationTokenSource.IsCancellationRequested);
await Task.Delay(1000);
await Assert.ThrowsAsync<TaskCanceledException>(async () =>
{
// Should throw for the second chunk
Assert.True(cancellationTokenSource.IsCancellationRequested);
Assert.True(cancellationTokenSource.Token.IsCancellationRequested);
await enumerator.MoveNextAsync();
await enumerator.MoveNextAsync();
});
}
Confirm this is not an issue with the OpenAI Python Library
Confirm this is not an issue with the underlying OpenAI API
Confirm this is not an issue with Azure OpenAI
Describe the bug
We got an issue raised at Semantic Kernel that ended up also being a potential bug on the SDK implementation.
After starting the streaming of the first chunk and issuing
Cancel
on the token, the stream is not cancelled and continues to give the remaining chunks.To Reproduce
Failing xUnit Reproduction Code (should Pass)
Code snippets
No response
OS
Windows 11
.NET version
.net 8
Library version
beta.11