Closed nicpottier closed 5 years ago
So a bit a design quandary around call retries (note for now I am ignoring call throttling to keep this conversation simple but that is a similar problem):
Right now mailroom does not create a FlowSession until a call has actually started, IE, we request for Twilio to make a call and when we finally get a callback from Twilio that the call is connected, then and only then do we start the Flow create the session etc..
The way this works in practice is we first create a ChannelConnection and call out to Twilio to initiate the call and in the callback URL have a reference to the start id that will be used to start the flow. This has all the other elements we need such as extra
etc in order to properly start the flow session.
All fine and good and that feels correct to me, we aren't starting FlowSessions and creating FlowRuns until a call is actually connected, which seems right. Problem is around retries. Say instead of getting a callback from Twilio that the call is connected we instead get a callback that the line was busy, so we need to retry in five minutes. We can set the status of our ChannelConnection to E
, increment our error_count
and set next_attempt
to five minutes from now. That makes it easy to trigger retries, every minute a cron can look for errored calls whose next_attempt
is in the past and start over.
All good but at the time of the retry we no longer know the flow start id, so we don't know which flow to start, extra
to use etc..
In the old world, we did this in reverse. We first create an empty FlowRun for each contact, then create the ChannelConnection and queue the call. Because of FlowRun has extra
, parent
etc it can do retries just fine. In contrast in the new world FlowRuns are really only created in response to a FlowSession, so that really doesn't work so well.
So possible solutions for this in Mailroom:
extra
, parent
etc.. When we do actually get connected blow away that FlowRun and create the real runs associated with our session. Seems very very jank and prone to error with these temporary runs hanging around.extra
, what flow to start etc, as needed on retries. The place this doesn't work is for trigger flow actions. We have said FlowStarts will only represent the user action of starting a flow, not every flow start, so for the case of an SMS flow starting an IVR flow via a session_triggered
event, we don't have a persisted FlowStart to reference in our ChannelConnection. So we would have to revisit that decision and tweak our schema on FlowStart to perhaps say that FlowStarts can have a null created_by
which indicates it is a system start? (note that we kind of have this problem already, in that session_triggered
events do use FlowStarts tasks to get everybody started, but use a transient FlowStart to do so (ie, not persisted to the db) So could be said that having a way of representing that is better there as well. Downside is we are then potentially creating millions of new FlowStart objects / M2Ms. (think resistbot)Thoughts @dodobas @ericnewcomer @rowanseymour @norkans7 ?
My initial gut when reading the problem was to look at doing something in the vein of #2 so was leaning that way before reading it. Can you elaborate on the millions of objects scenario? Seems even in Resistbot land it’s not common to trigger contacts from one flow to another. The subflow case, which is common however, is unaffected.
As for retries, IVR generally operates at a different scale than messaging but I can appreciate some concern with making design designs around it staying that way.
Definitely not a fan of 1. I don't think FlowRun.extra
will even exist in the new world.
I think it has to be special "retry" kind of FlowStart
(2) or a new model (PendingCall
?) (3). Maybe retry flow starts are deleted after they're needed? Something to be said for having a dedicated model for this.
Ya, I think resist only uses triggers when doing say cross channel / URN flows (claiming a whatsapp number or whatnot) so not super super common. This stuff will get cleaned up with archival so we really are only worrying about last 90 anyways.
If sure "feels" like flow start is the right place for this in general but ya wanted to talk through those options first.
The only thing that gives me pause is just that it is still a bit vague as to what the rules are as to when FlowStarts are created. We would be saying they get created on session trigger events, when a user manually starts a flow, but not when flows are started via a trigger like a missed call or keyword for example. So kinda goofy? I suppose session triggers share the property that the start may be on a group so maybe that makes sense.
@rowanseymour to be clear in the "typical" case of an IVR call being started by a user there isn't really a need for a "special" flow start there, one is already created from the UI, it is the backwards relation from a ChannelConnection to that FlowStart that we are missing. (and thus the need for the M2M from FlowStart to ChannelConnection)
(and yes, just having a foreign key from ChannelConnection to FlowStart would be the simplest but that seems like a wrong direction dependency)
Correction: archiver does NOT currently get rid of flow starts, but maybe we can do that for those that have NULL created_by
going forward?
Yeah.. there would be a nice simplicity to re-using FlowStart
(maybe the rule is that anything that can't start the session synchronously uses a FlowStart
). Cleanup of "non user initiated" flow starts could happen alongside cleanup of FlowSession
objects, i.e. both considered temporary.
Ya I guess the synchronous way of looking at it seems right. Ok.. so are we ok with making created_by
and modified_by
nullable on FlowStart and adding a M2M to ChannelConnection?
Feels right to me!
Support for Twilio and Nexmo at launch.
Week of Jan 7th:
Week of Jan 14th:
Week of Jan 21st: