twilio-samples / speech-assistant-openai-realtime-api-python

MIT License
73 stars 32 forks source link

server VAD doesn't seem to pick up voice stopping #6

Closed robcontreras closed 3 days ago

robcontreras commented 3 weeks ago

I can successfully connect and and event is received when I start speaking, but it gets hanged here, it never detects when I stopped and it just stays there until the timeout is reached, am I missing something?

Session updated successfully: {'type': 'session.updated', 'event_id': 'event_AF8eSFQtCeDPjLip7kbgJ', 'session': {'id': 'sess_AF8eRPbIlNNeKrx6ghnHj', 'object': 'realtime.session', 'model': 'gpt-4o-realtime-preview-2024-10-01', 'expires_at': 1728172439, 'modalities': ['text', 'audio'], 'instructions': 'You are a helpful and bubbly AI assistant who loves to chat about anything the user is interested in and is prepared to offer them facts. You have a penchant for dad jokes, owl jokes, and rickrolling – subtly. Always stay positive, but work in a joke when appropriate.', 'voice': 'alloy', 'turn_detection': {'type': 'server_vad', 'threshold': 0.5, 'prefix_padding_ms': 300, 'silence_duration_ms': 500}, 'input_audio_format': 'pcm16', 'output_audio_format': 'g711_ulaw', 'input_audio_transcription': None, 'tool_choice': 'auto', 'temperature': 0.8, 'max_response_output_tokens': 'inf', 'tools': []}}

Received event: input_audio_buffer.speech_started {'type': 'input_audio_buffer.speech_started', 'event_id': 'event_AF8eZJem71dtYCftbFvOP', 'audio_start_ms': 512, 'item_id': 'item_AF8eZKJuxUJmFcjQzCoPN'}
jhmaddox commented 2 weeks ago

I spent a bit of time looking at this and got interruption working quite well on my branch. Here are a few things:

Hope that helps!

pkamp3 commented 2 weeks ago

Just want to chime in that we're looking at interruptions/preemption on the node version at the moment: https://github.com/twilio-samples/speech-assistant-openai-realtime-api-node/issues/9 .

@jhmaddox you're using a queue and Mark message to determine when to send the next response.audio.delta to Twilio?

frmsaul commented 2 weeks ago

@robcontreras

I had success with doing this:

                    if response['type'] == 'input_audio_buffer.speech_started':
                        await websocket.send_json({ "event": "clear",
                                                    "streamSid": stream_sid })

It flushes the twilio audi stream. Basically clears whatever you sent already.

pkamp3 commented 1 week ago

https://github.com/twilio-samples/speech-assistant-openai-realtime-api-python/pull/13 I have a PR here if anyone would like to test. It should work within a reasonable error around your interruption. We're looking internally as well.

pkamp3 commented 3 days ago

https://github.com/twilio-samples/speech-assistant-openai-realtime-api-python/pull/13 closing with this merged