rapidpro / mailroom

Backend services for the RapidPro platform.
Other
25 stars 28 forks source link

Add IVR Support #59

Closed nicpottier closed 5 years ago

nicpottier commented 5 years ago

Support for Twilio and Nexmo at launch.

Week of Jan 7th:

Week of Jan 14th:

Week of Jan 21st:

nicpottier commented 5 years ago

So a bit a design quandary around call retries (note for now I am ignoring call throttling to keep this conversation simple but that is a similar problem):

Right now mailroom does not create a FlowSession until a call has actually started, IE, we request for Twilio to make a call and when we finally get a callback from Twilio that the call is connected, then and only then do we start the Flow create the session etc..

The way this works in practice is we first create a ChannelConnection and call out to Twilio to initiate the call and in the callback URL have a reference to the start id that will be used to start the flow. This has all the other elements we need such as extra etc in order to properly start the flow session.

All fine and good and that feels correct to me, we aren't starting FlowSessions and creating FlowRuns until a call is actually connected, which seems right. Problem is around retries. Say instead of getting a callback from Twilio that the call is connected we instead get a callback that the line was busy, so we need to retry in five minutes. We can set the status of our ChannelConnection to E, increment our error_count and set next_attempt to five minutes from now. That makes it easy to trigger retries, every minute a cron can look for errored calls whose next_attempt is in the past and start over.

All good but at the time of the retry we no longer know the flow start id, so we don't know which flow to start, extra to use etc..

In the old world, we did this in reverse. We first create an empty FlowRun for each contact, then create the ChannelConnection and queue the call. Because of FlowRun has extra, parent etc it can do retries just fine. In contrast in the new world FlowRuns are really only created in response to a FlowSession, so that really doesn't work so well.

So possible solutions for this in Mailroom:

  1. do something similar as the old world with a throw away FlowRun. IE, create the flow run to track that we need to start the contact in that call, connect a ChannelConnection to it, and in the case of retries we have something that is keeping state of extra, parent etc.. When we do actually get connected blow away that FlowRun and create the real runs associated with our session. Seems very very jank and prone to error with these temporary runs hanging around.
  2. create a many to many between FlowStart and ChannelConnection. When creating ChannelConnections we make sure to associate that so we can look up extra, what flow to start etc, as needed on retries. The place this doesn't work is for trigger flow actions. We have said FlowStarts will only represent the user action of starting a flow, not every flow start, so for the case of an SMS flow starting an IVR flow via a session_triggered event, we don't have a persisted FlowStart to reference in our ChannelConnection. So we would have to revisit that decision and tweak our schema on FlowStart to perhaps say that FlowStarts can have a null created_by which indicates it is a system start? (note that we kind of have this problem already, in that session_triggered events do use FlowStarts tasks to get everybody started, but use a transient FlowStart to do so (ie, not persisted to the db) So could be said that having a way of representing that is better there as well. Downside is we are then potentially creating millions of new FlowStart objects / M2Ms. (think resistbot)
  3. create some new kind of model that represents a pending flow run. This is essentially taking what we need out of 1) and 2) above and only keeping the bits we want and need. To me this sure feels like FlowStart above though and I'd have a hard time coming up with a name for it.

Thoughts @dodobas @ericnewcomer @rowanseymour @norkans7 ?

ericnewcomer commented 5 years ago

My initial gut when reading the problem was to look at doing something in the vein of #2 so was leaning that way before reading it. Can you elaborate on the millions of objects scenario? Seems even in Resistbot land it’s not common to trigger contacts from one flow to another. The subflow case, which is common however, is unaffected.

As for retries, IVR generally operates at a different scale than messaging but I can appreciate some concern with making design designs around it staying that way.

rowanseymour commented 5 years ago

Definitely not a fan of 1. I don't think FlowRun.extra will even exist in the new world.

I think it has to be special "retry" kind of FlowStart (2) or a new model (PendingCall ?) (3). Maybe retry flow starts are deleted after they're needed? Something to be said for having a dedicated model for this.

nicpottier commented 5 years ago

Ya, I think resist only uses triggers when doing say cross channel / URN flows (claiming a whatsapp number or whatnot) so not super super common. This stuff will get cleaned up with archival so we really are only worrying about last 90 anyways.

If sure "feels" like flow start is the right place for this in general but ya wanted to talk through those options first.

The only thing that gives me pause is just that it is still a bit vague as to what the rules are as to when FlowStarts are created. We would be saying they get created on session trigger events, when a user manually starts a flow, but not when flows are started via a trigger like a missed call or keyword for example. So kinda goofy? I suppose session triggers share the property that the start may be on a group so maybe that makes sense.

nicpottier commented 5 years ago

@rowanseymour to be clear in the "typical" case of an IVR call being started by a user there isn't really a need for a "special" flow start there, one is already created from the UI, it is the backwards relation from a ChannelConnection to that FlowStart that we are missing. (and thus the need for the M2M from FlowStart to ChannelConnection)

nicpottier commented 5 years ago

(and yes, just having a foreign key from ChannelConnection to FlowStart would be the simplest but that seems like a wrong direction dependency)

nicpottier commented 5 years ago

Correction: archiver does NOT currently get rid of flow starts, but maybe we can do that for those that have NULL created_by going forward?

rowanseymour commented 5 years ago

Yeah.. there would be a nice simplicity to re-using FlowStart (maybe the rule is that anything that can't start the session synchronously uses a FlowStart). Cleanup of "non user initiated" flow starts could happen alongside cleanup of FlowSession objects, i.e. both considered temporary.

nicpottier commented 5 years ago

Ya I guess the synchronous way of looking at it seems right. Ok.. so are we ok with making created_by and modified_by nullable on FlowStart and adding a M2M to ChannelConnection?

ericnewcomer commented 5 years ago

Feels right to me!