nyaruka / goflow

Flow engine for RapidPro/TextIt.
Other
43 stars 19 forks source link

Re-evaluating the action > result > router model #774

Closed rowanseymour closed 4 years ago

rowanseymour commented 4 years ago

Current approach

For webhook calls in the engine, we introduced a new pattern for functionality which in the old engine would have been handled by custom routing logic.

  1. Call webhook action generates a result
  2. Switch router uses expressions to route on it
  3. Switch router doesn’t save its usual result

Advantages:

Disadvantages:

Trying to make this work for NLU introduces another disadvantage:

Other approaches

Custom routers

Essentially the legacy flow approach - introduce custom router types. The logic for calling a webhook, classifying NLU, transferring airtime lives in the router.

Advantages:

Disadvantages:

Action saves something, router saves result

The action should write data somewhere else in the context. It's not necessarily formatted like a result, but it's accessible in expressions so routing can still be performed by a regular switch router.

Advantages:

Disadvantages:

ericnewcomer commented 4 years ago

I have to say, the webhook/resthook/airtime cases have long made me feel that we were square pegging things into switch routers just because we could. The NLU case, even more dramatically so. I think having custom routers for some of these things is clean and appropriate. Maybe the distinction is switch routers route on user defined rules and the others are bespoke.

rowanseymour commented 4 years ago

I would imagine actually the editor is the place where the shortcomings of the current approach are most evident since it places the onus on the editor to manage the coupling of an action and a router. From an engine POV they know nothing of each other and we assume their categories will always match up.

Further thoughts:

nicpottier commented 4 years ago

I guess I will be the voice of dissent here. I think there's huge utility to having distinct classes on what things can do, and having routers only route is a big part of that.

I think we ended up where we are with webhooks because we didn't want to cross the bridge of having events be available in the context. I believe our original approach was actually having us look backwards in the event log to find the last webhook event, our expression used that to evaluate the state of things. But when we decided everything needs to be an expression (which I still think is right)

Personally I think I would rather have us expose events and build smarter expressions than have smarter routers.

ericnewcomer commented 4 years ago

I actually like that the editor has the ability to construct complex types that boil down to primitives, so that part I don't find super offensive. But for things that really feel like top-level things, like NLU, I don't see a ton of value tripping over ourselves trying to build it from primitives at the editor level.

ericnewcomer commented 4 years ago

On smarter expressions, fun side effect there is giving more utility in templates. How would you see exposing events to expressions working? It's that loose connection of which event you are looking at that feels yucky to me. Also, what if that event isn't there? I guess that's an Other?

rowanseymour commented 4 years ago

I guess I will be the voice of dissent here

Someone has to do it!

I think there's huge utility to having distinct classes on what things can do, and having routers only route is a big part of that.

I guess I would challenge you unpack what huge utility means in practical terms. I don't disagree that aren't benefits to that view of routers.. it's just right now I feel like I could more easily add a custom router type to the engine than something that works with a switch router.

The core of the problem as I see it is that the action ends up effectively doing the routing - since it has to hand the router a result which matches its exits, and then the router ends up being a mapping from result categories to exits which feels needlessly complex (as opposed to being too dumb).

So if we keep the action + router approach.. it'd be nice if the action just fetched and stored something in its most natural representation - and then the router generated a simple result from that which corresponds to the exit chosen.

Places the action could write its data...

I'm trying to remember what we hated about exposing events in expressions.. all I can think is that it does feel a little like exposing the internals of the engine to users. Makes me nervous to think what users would do with access to all events.. also are there issues here with URN redaction?

nicpottier commented 4 years ago

I guess I would challenge you unpack what huge utility means in practical terms. I don't disagree that aren't benefits to that view of routers.. it's just right now I feel like I could more easily add a custom router type to the engine than something that works with a switch router.

Having rules around what things can do what greatly eases the understanding of the engine and reasoning about how it works. The more we blur the lines the harder that will be. We did that in the old engine and we paid the price, it needs to be a really powerful arguments for me to feel we should abandon that.

The core of the problem as I see it is that the action ends up effectively doing the routing - since it has to hand the router a result which matches its exits, and then the router ends up being a mapping from result categories to exits which feels needlessly complex (as opposed to being too dumb).

Maybe I'm missing something, but I don't understand this. Why is that? Couldn't we have a "classify" action that takes the object to classify and creates a _classify result where the extra is the full set of intents and scores.

We then have a few expressions that are tailored to ingest that kind of result and do the right thing. Isn't that still simple? I guess I don't see how the "since it has to hand the router a result which matches its exits" is any more true than say a split by contact group or field or anything.

Note that I'm not sure we have actually even arrived at the right set of use cases for classifying. IE, as I'm seeing these apps I think the common case will be to have a single classifier connected to your account and a few different intents, but you may really want different routing based on the confidence of intents and where you do your splitting you may only care about one or two of the intents you can actually score.

To make that a bit more concrete, I could see you deciding to go down a different path is profanity is "> 90%" even if it isn't the top scoring. IE, "BOOK MY FUCKING FLIGHT TO MIAMI!" I could also see you having a bunch of possible intents on your app but you really are only interested in the case of pulling out specific entities from it and continuing on if greater than 90%".

"where do you want to go today?" "I'm flying to miami to seattle"

In that case you probably want to pass the response to your classifier but only continue on if the intent of "flying" or whatever comes back over 90% so you can extract the entities.

IE, I'm not so sure highest is always the right thing, though agree that is probably the most likely. So I think we have to solve for both.

So to me what maybe makes sense is that the classify action gives you a magic _classify result or something which has an extra of all the intents / entities / scores and then we have some expressions that can ingest those in various ways, either to take the full set and return true if above thresholds or sort them and pull out the top one or whatever. Those in turn would create results which would make the entity extracted a bit easier to access for the user.

What are the downsides there?

nicpottier commented 4 years ago

To be clear I'm not saying we have to support all these cases out of the gate, but they are an argument towards the action / router model as opposed to the smart router model. It is fairly easy to imagine just making an action that populates a _classify result and adding a top_intent(result.foo, .5) operand and that being enough, while also then giving us room to add the ability to do much more with expressions in the future.

nicpottier commented 4 years ago

Having things be pure expressions also allows us to offer escape hatches for more complicated uses cases. IE, you could potentially build a split first looking for profanity > 90%, then one that uses highest intent. Again, only really possible when these things are decoupled and pure expressions.

rowanseymour commented 4 years ago

Maybe I'm missing something, but I don't understand this. Why is that? Couldn't we have a "classify" action that takes the object to classify and creates a _classify result where the extra is the full set of intents and scores.

I'm describing the current approach of having the action save a single result for node. Yes we can could introduce a new special result and that's exactly one of the options you'll see I proposed in my last comment.

I prefer that over exposing events. We're careful about what we put in results.

nicpottier commented 4 years ago

I'm describing the current approach of having the action save a single result for node. Yes we can could introduce a new special result and that's exactly one of the options you'll see I proposed in my last comment.

Ah ok, sorry, didn't realize that was the settled direction.

Well then if it isn't obvious then I would say my vote is extra result and smart expressions, not a rethink of our router model.

rowanseymour commented 4 years ago

Ok so let's sketch out more what special results looks like. I don't hate _classify.. _airtime as a pattern... cuz I'm pretty sure users can't create results like that right now.. pretty sure I need to tweak excellent to allow names that start with _

In the case of an NLU action then.. maybe it saves...

{
  "name": "_classify",
  "value": "0.92",
  "category": "success",
  "input": "book me a flight to Quito!",
  "extra": {
    "intents": [
      {"name": "book_flight", "confidence": 0.92},
      {"name": "book_hotel", "confidence": 0.08}
    ],
    "entities": {
      "location": [
        {"value": "Quito", "confidence": 1.0} 
      ],
      "date": [
        {"value": "May 21", "confidence": 0.6}
      ]
    }
  }
}
{
  "name": "_classify",
  "value": "0",
  "category": "failure",
  "input": "book me a flight to Quito!",
  "extra": {}
}

But the router saves something like...

{
  "name": "Intent",
  "value": "0.98",
  "category": "book_flight",
  "input": "book me a flight to Quito!",
  "extra": {
    "location": [
      {"value": "Quito", "confidence": 1.0} 
    ],
    "date": [
      {"value": "May 21", "confidence": 0.6}
    ]
  }
}
{
  "name": "Intent",
  "value": "0.2",
  "category": "Not Sure",
  "input": "book me a flight to Quito!",
  "extra": {}
}
{
  "name": "Intent",
  "value": "0",
  "category": "Failure",
  "input": "book me a flight to Quito!",
  "extra": {}
}

It depends on the amount of functionality we want to expose for this first attempt at NLU but if we just want the node to offer a single minimum confidence value which routes to a "Not Sure" category... then it seems we can do all that with existing case-test functions.

nicpottier commented 4 years ago

Note that I would also be ok for a classify "node" with a result name of foo to create two results (one from action, one from router) of foo_classification and foo. That may be more natural though could get clobbered by the user I suppose. (and vice versa) But maybe clearer / easier than magic _classify names? Not sure, could probably be convinced of either.

Do these results appear in exports?

No I don't think so.

Do they appear in auto-complete?

I think long term yes? Just because part of the point is to allow users to escape hatch into more powerful use of this stuff so no reason to hide it there.

Do they get run_result_changed events?

I think so yes. At least I can't think of any good reason at an engine level why we wouldn't.

In your examples above what is your operand and tests?

nicpottier commented 4 years ago

Pro to doing a suffix is of course that you could refer to previous classifications and you aren't always clobbering the last.

rowanseymour commented 4 years ago

Pro to doing a prefix like _ is that can then have meaning such as "don't include this in exports".... could have _foo_classification

nicpottier commented 4 years ago

True.. though do we allow result names like 'Hello' which get turned into _Hello_? As I said I'm down for either, the named prefix version does seem to hold the advantages of both with minimal possibility for clobbering.

rowanseymour commented 4 years ago

At an editor level...

Captura de Pantalla 2019-10-02 a la(s) 16 40 51

At an engine level you could have a result called 'Foo which would be snaked as _foo and you wouldn't be able to write @results._foo but you could write @results["_foo"]. Obviously something we can change if we want. Recall the workaround for result names that start with numbers was @results["2"] because @results.2 would be an array lookup.

nicpottier commented 4 years ago

I think you CAN do Main though.

nicpottier commented 4 years ago

(leading space there)

nicpottier commented 4 years ago

but easy enough to change and not worried about it.. I'm down with leading _

ericnewcomer commented 4 years ago

Catching up here..

Keeping things decoupled but with expressions against results instead of events sounds like the best of both worlds. So totally agree with where you guys ended up on that bit.

So for rules, is the thinking that we'd have an operand like @nicpottier suggested.. @(top_intent(result.foo, .5)) and then just a matching rule on the intent name returned?

rowanseymour commented 4 years ago

Ok forgetting what I said about using existing case tests cuz I can't make that work...

Operand could be.. @results._foo_classification

Test function has_intent(result, name, min_confidence) takes the result and returns true if the name of the top intent is name and it's confidence is >= than min_confidence

Handling the other cases is a bit weirder. We have several possibilities:

  1. The top ranked intent is something other than we know about
  2. There are no intents returned from the NLU provider
  3. None of the intents meet a minimum confidence
  4. The result doesn't exist
  5. The result does exist but has category of "failure"

I think we'd want 1,2 and 3 to be categorized as "Not Sure" and 4, 5 as "Failure". In which case we'd want to explicitly match 1, 2 and 3... and let 4, 5 fall into the default category of the router.

To match 1, 2, or 3.. could..

rowanseymour commented 4 years ago

Above is based on the assumption you want individual confidence thresholds for intents but if you don't need that.. then could definitely put the function call in the operand.

If top_intent(result, min_confidence) can return an intent name or an error... then you can make have a

  1. series of tests which match explicit intent names => Intent Categories
  2. a has_error test => Category "Failure"

And the the router's default category catches the "Not Sure" case

nicpottier commented 4 years ago

Is this made simpler by having "Not Sure" just be "Other" and that is a fall through / last rule? Seems you could then still use the has_intent as you have above and have 1,2,3 do the right thing of ending up in "Other".

4) seems like a flow authoring error so not too worried about that falling through to Other.

5) can then just look at category of "failure"

ericnewcomer commented 4 years ago

Ya, ultimately this is where we want to be having the confidence on the rule, might as well start there. Agree we need to separate failure into its own route. However, should failure be the default or should it be not sure? I feel like the default would feel more correct as not sure and explicitly matching for a failure route makes sense. Not critical, but might change what functions we need.

ericnewcomer commented 4 years ago

I think @nicpottier and I just said the same thing slightly differently.

nicpottier commented 4 years ago

Note that I think we likely need BOTH top_intent and has_intent and top_intent seems like the minimum viable feature for this. But we should have a path towards the latter too.

nicpottier commented 4 years ago

Maybe top_intent is simply a filter on the result that strips all but the top intent and has_intent is still used on all the rules? So we use top_intent in the operant but all the tests are still has_intent?

rowanseymour commented 4 years ago
  1. seems like a flow authoring error so not too worried about that falling through to Other.

@nicpottier that would depend on under what error conditions the action saves a result. There's several paths that could lead to a call_webhook action not saving a result but we could guarantee that the classify action always saves a result in cases e.g. there's no NLU provider etc cuz those feel like they should be treated as failures.

nicpottier commented 4 years ago

Hrmm.. ya I guess we get to make the rules. For this type of thing I think I feel that the result should always be written but with a category that covers different failure modes. That feels consistent with the webhook case no? Either way it feels most correct for these kind of _implicit_results for the very reasons stated here. :)

rowanseymour commented 4 years ago

Switch router operands do have the unique property of being typed so the operator in this case can evaluate to an intent object which is passed to tests. In which case operand could be...

@results._foo_classification.extra.intents.0

which evaluates to either

Then you just need a test function that can test name is x and confidence is > y

nicpottier commented 4 years ago

You lose entities in this case though. (assuming that has_intent returns both the matching intent and entities)

nicpottier commented 4 years ago

Actually where DOES extra come from? Now I'm confused again.

rowanseymour commented 4 years ago

@nicpottier what I'm saying is that currently the call_webhook action has plenty execution paths that don't return a result.. but we can change that... seems reasonable to say that all these result saving actions should always save a result.

rowanseymour commented 4 years ago

@nicpottier extra is set by the test function... so I think we're back to the operand being the whole result from the action... and then has_intent on each category taking the entire result

ericnewcomer commented 4 years ago

Ya, I don't really see there being any real savings to implementing a router level confidence if we want to go to per-rule anyways. It doesn't seem like the simplicity there ought to matter for users doing NLU stuff as well.

nicpottier commented 4 years ago

So just to wrap this up, can't we just implement a top_intent() expression that takes a full classify result and strips out all the intents except the highest? That gets us where we want to be to be able to use has_intent() the same both for the "split by top intent over this confidence" and "let me define an ordered set of intent rules"

rowanseymour commented 4 years ago

I need to clarify what the top_intent you are proposing does. I think you're saying its job is to take a result structured like:

{
  "name": "_classify",
  "value": "0.92",
  "category": "success",
  "input": "book me a flight to Quito!",
  "extra": {
    "intents": [
      {"name": "book_flight", "confidence": 0.92},
      {"name": "book_hotel", "confidence": 0.08}
    ],
    "entities": {
      "location": [
        {"value": "Quito", "confidence": 1.0} 
      ],
      "date": [
        {"value": "May 21", "confidence": 0.6}
      ]
    }
  }
}

And return an object structured like:

{
  "intent": {"name": "book_flight", "confidence": 0.92},
  "entities": {
    "location": {"value": "Quito", "confidence": 1.0},
    "date": {"value": "May 21", "confidence": 0.6}
  }
}

or if the top intent < the confidence passed:

{
  "intent": null,
  "entities": {
    "location": {"value": "Quito", "confidence": 1.0},
    "date": {"value": "May 21", "confidence": 0.6}
  }
}

And then has_intent(name, [confidence]) expects an object in that format. If the intent name == name and it's confidence is >= confidence then it generates a test match like

{
  "matched": true,
  "match": "0.92",
  "extra": {
    "location": {"value": "Quito", "confidence": 1.0},
    "date": {"value": "May 21", "confidence": 0.6}
  }
}

Which then the router can convert into a run result like

{
  "name": "Intent",
  "value": "0.92",
  "category": "book_flight",
  "input": "book me a flight to Quito!",
  "extra": {
    "location": {"value": "Quito", "confidence": 1.0},
    "date": {"value": "May 21", "confidence": 0.6}
  }
}

So I guess I'm not quite getting the point of top_intent.. why not let has_intent just take a result which then also makes it easier to use in normal expressions has_intent(results.foo, 0.5) vs has_intent(top_intent(results.foo, 0.5))

rowanseymour commented 4 years ago

Other small argument for making the result the operand - we could have a has_category test that takes a result which I think solves some routing issues here and would be re-usable for other node types that follow this pattern in the future

ericnewcomer commented 4 years ago

I don't see the need to do top_result mostly because it seems there isn't any thing to be gained with a router level confidence (at least I don't think there is from an effort standpoint) other than maybe simplicity for the user -- which in this particular case, I don't think is really necessary.

Maybe you can help me understand the reasoning to implement a router level confidence before we do per-rule?

nicpottier commented 4 years ago

Sorry, maybe I'm missing something, how do we implement someone wanting to route based on the top intent if we don't have top_intent? The idea of top_intent is to take a full result and strip out all but the top intents, returning a result looking thing. It doesn't take a threshold, its job is just to strip to only the top.

You then has has_intent take care testing thresholds. That is the same whether working on a operant of top_intent(results._classify_foo) or results._classify_foo

ericnewcomer commented 4 years ago

I was confused because the top_intent example you gave took a confidence. Now a function to take the highest regardless of confidence? That would be a different thing. Is that really something you see people wanting to do though? Seems there needs to be some minimum confidence.

nicpottier commented 4 years ago

The filtering would then be done by has_intent

I really don't care how it is done, just asking what is your answer to getting the top intent in the world you are talking about?

rowanseymour commented 4 years ago

The version of has_intent I've been describing uses the top intent from the result..

nicpottier commented 4 years ago

Oh, it only looks at the top intent? Then how do you have more than one set of confidences?

Again, two use cases we want to think about:

  1. split by top intent (above some threshold)
  2. given a list of intents / thresholds, pick first that matches (that's the profanity is high but not highest use case)
nicpottier commented 4 years ago

My understand of has_intent(results, "order_coffee", .5) was that it looked at all intents in the result and returned the matching intent if it was above the threshold. IE, much closer to "has_word" etc..

ericnewcomer commented 4 years ago

Sorry, I'm just dense. I just never got past the reason there. So the scenario you are talking about is having two intents matching above the confidence and wanting to use the highest one, not the implied order in the rules. Still feels a bit edgy to me, but don't see any reason to not allow that too.

rowanseymour commented 4 years ago

Ok then we're thinking about different use-cases. My understanding (same as @ericnewcomer's) of having per-intent confidences was that you're still answering the question "what is the top intent".. just that each intent has its own minimum confidence before the flow punts to Other and re-prompts the contact. You're saying in the second usecase the tests become ordered matches looking for a specific intent...

To be honest it's hard to imagine how useful/un-useful either version of the multiple confidences without having ever tried this. I think we agree that for now we're looking for a solution for the first use-case which is extendable in the future to something like the second.

But if the 2nd use case as you understand it is a must.. I think it would be preferable to use the result as the operand and have two test functions like has_top_intent and has_any_intent. That way we don't have to add an excellent function which knows about flow stuff like NLU results.

nicpottier commented 4 years ago

I'm fine with has_intent and has_top_intent that seems to provide all the flexibility for both use cases.

nicpottier commented 4 years ago

You're saying in the second usecase the tests become ordered matches looking for a specific intent...

Yes, which you need for use case 2) above. IE, shortcut if profanity or sentiment is bad above some threshold.