openai / openai-dotnet

The official .NET library for the OpenAI API
https://www.nuget.org/packages/OpenAI
MIT License
1.42k stars 138 forks source link

(Beta) Websocket closing unexpectedly on ReceiveUpdatesAsync #244

Open camelCase12 opened 4 weeks ago

camelCase12 commented 4 weeks ago

Service

OpenAI

Describe the bug

I was setting up the realtime API for voice chatting, but I found after my first question and reply from the model, the web socket would close and terminate the conversation. (This was while using CreateServerVoiceActivityTurnDetectionOptions)

When calling ReceiveUpdatesAsync to receive server messages, I didn't expect this to close the _clientWebSocket, but it did.

It took me a while to figure out that the AsyncWebsocketMessageResultEnumerator is automatically disposing of the _clientWebSocket when it finishes iterating, causing ReceiveUpdatesAsync to terminate the WebSocket. Commenting out the disposing call within AsyncWebsocketMessageResultEnumeratorresulted in the expected behavior for me:

public ValueTask DisposeAsync()
{
    //_clientWebSocket?.Dispose();
    return new ValueTask(Task.CompletedTask);
}

This allowed me to have my expected 2-way conversation. If there's another intended method to receive server events without terminating the socket, or to keep the socket alive for more than one request-response, please let me know.

Steps to reproduce

  1. Initialize a RealtimeConversationSession with server voice activity turn detection:
    
    var client = new RealtimeConversationClient(model: "gpt-4o-realtime-preview-2024-10-01", new(apiKey));

CancellationTokenSource cts = new();

var session = await client.StartConversationSessionAsync(cts.Token);

var options = new ConversationSessionOptions() { Instructions = "", TurnDetectionOptions = ConversationTurnDetectionOptions.CreateServerVoiceActivityTurnDetectionOptions(0.5f, TimeSpan.FromMilliseconds(300), TimeSpan.FromMilliseconds(200)), Voice = ConversationVoice.Alloy, OutputAudioFormat = ConversationAudioFormat.Pcm16, InputTranscriptionOptions = new ConversationInputTranscriptionOptions() { Model = "whisper-1" } };

await session.ConfigureSessionAsync(options);


2. Begin sending audio through `SendAudioAsync` (in my case with NAudio Wave):
```csharp
waveIn.DataAvailable += (s, a) =>
{
    using var memoryStream = new MemoryStream();
    memoryStream.Write(a.Buffer, 0, a.BytesRecorded);
    memoryStream.Position = 0;
    session.SendAudioAsync(memoryStream, token).Wait();
};
  1. Begin handling server responses with ReceiveUpdatesAsync in a loop:
while (true)
{
    await foreach (var update in session.ReceiveUpdatesAsync(token))
    {
        //Handle received updates
    }
}
  1. Make an audible request to the AI, and wait for its response to complete. On the second loop, ReceiveUpdatesAsync will throw a System.ObjectDisposedException:
Cannot access a disposed object.
Object name: 'System.Net.WebSockets.ClientWebSocket'.

Code snippets

No response

OS

winOS

.NET version

.NET 8 Core

Library version

2.1.0-beta.1

edo4444 commented 2 weeks ago

i dont know if it helps, but any unhandled exception on your code will terminate the websocket, been there.

isaac-j-miller commented 2 weeks ago

I am having the same issue. Because the AsyncWebsocketMessageResultEnumerator disposes the client websocket, I cannot do anything with the websocket after iterating through the responses. I don't think AsyncWebsocketMessageResultEnumerator should be responsible for disposing the client websocket because it doesn't create it. the RealtimeConversationSession class should solely be responsible for this, IMO

isaac-j-miller commented 2 weeks ago

I've opened PR #261 to address this