livekit / sip

SIP to WebRTC bridge for LiveKit
78 stars 19 forks source link

Outbound Call Bot Starts Talking Before Call is Picked Up #114

Open akshatvg opened 2 months ago

akshatvg commented 2 months ago

Description: I'm experiencing an issue with the outbound call feature where the bot starts talking before the person picks up the call. I am using Twilio SIP for making these calls. Ideally, I want the bot to start its interaction only after the call has been answered. Below are the details of my setup:

Steps to Reproduce:

  1. Configure an outbound call using Twilio SIP.
  2. Initiate the call.
  3. Observe the bot starts talking before the call is picked up by the recipient.

Expected Behavior: The bot should begin speaking only after the recipient has picked up the call.

Actual Behavior: The bot starts talking immediately after the call is initiated, even if the recipient hasn't picked up the call yet.

Environment:

livekit==0.9.2
livekit-agents==0.4.0
livekit-api==0.4.3
livekit-protocol==0.3.2

Possible Workaround: Is there any configuration in LiveKit or Twilio SIP that I might have missed which could delay the bot's speech until the call is picked up?

Thank you for your assistance in resolving this issue.

dennwc commented 2 months ago

Indeed, SIP creates room participant before the call is established and may play a ringtone to the room during this time. Once the call is picked up, it will switch audio to it.

Thus, this problem can be solved in a few ways:

  1. SIP should not create participant before the call is established. It's unlikely that we will implement it, because we'd still want see some visual indication of the call being made.
  2. SIP can publish call status updates. This will require Agents framework to also wait for certain status before starting the actual processing.
  3. If ringtone is disabled, SIP could avoid track publishing until call is connected. This might be the cleanest way. Agents might also need to wait for the track.

cc @keepingitneil

davidzhao commented 2 months ago

I think we need a bit of explicit signaling (perhaps via the new attributes system) to signal that a participant is ready to receive agent speech.

For example, participant.attributes['readyForAgent'] = true

For a regular participant, we could set this when the user has subscribed to the agent's audio For SIP participant, it could be set once the call is established.

davidzhao commented 1 month ago

@dennwc now that we have an event to notify publisher if their track has been subscribed to. I think on the SIP side we should do the following:

  1. SIP joins room early, but do not subscribe to any tracks
  2. when phone picks up and negotiated, then subscribe to audio tracks in the room
  3. the agent will detect that their track has been subscribed to, and will start interactions then
dennwc commented 1 month ago

Nice! I will make the necessary changes then. SIP will still have to support both early and late track publishing, though. We still need to publish earlier if play_ringtone flag is set.

davidzhao commented 1 month ago

I think sip should always publish asap.. but it should not subscribe to remote tracks until the user has picked up?

dennwc commented 1 month ago

Sorry, I misread your comment. Sure, makes total sense :+1: