twilio / twilio-voice-react-native

Other
74 stars 29 forks source link

OpenAI x AMD x Realtime #449

Open florenciocvm opened 6 days ago

florenciocvm commented 6 days ago

A little bit off-topic here, but I hope an engineer ends up reading this.

My usecase involves: an outbound call to a company's customer service, IVR interaction, switch to realtime when a human picks up after traversing the IVR dialogs.

To be able to achieve this:

  1. I start a call with AMD turned on and use OAI's Realtime HTTP requests for audio-to-text responses without needing to transcribe first
  2. Have a few heuristics in the prompt to identify the momento of switch from machine to human -- I consider implementing LiveKit for VAD in the near future
  3. I switch to Websocket Realtime when the human picks up.

All of this is due to Twilio's Stream not supporting IVR response.play({digits}) command. If it did, I would eliminate steps 1 and 2. Also, there's significant lag in step 1. It still works, but not optimally. And step 2 being heuristic-based, it is by definition suboptimal.

As a fellow engineer (and recent stockholder, betting we will surf this AI-wave), I surely hope the team is already working hard to make this feature available soon.

Regards,

bobiechen-twilio commented 3 days ago

Hi @florenciocvm

Thanks for reaching out. Since this sounds more of a question of connecting components between TwiML and callbacks, I would recommend using the (Twilio Help Center](https://help.twilio.com/) so our experts can help you achieve the machine-human transition at the right time. (I believe it's possible but I'd defer to other experts who know this part better than I do)