microsoft / botframework-sdk

Bot Framework provides the most comprehensive experience for building conversation applications.
MIT License
7.48k stars 2.44k forks source link

Solving Bot State Concurrency: retryable writes #6370

Open Stevenic opened 3 years ago

Stevenic commented 3 years ago

Issue

Currently, all bot state changes are considered last-writer wins. This strategy helps prevent bot developers from having to create complex code for handling retry logic and it generally works well for the dialog stack when a user blasts the bot with multiple messages. There are, however, a lot of places where it would be desirable to make more deterministic and reliable changes to a bot state. Things like toggling flags, incrementing counters, etc. within group conversations are great examples.

Proposed change

I'd like to propose that we consider adding support for a set of Retryable Write Operations similar to those supported by MongoDB. These operations let the developer describe the change they'd like made to an object in a way that can be re-tried should a conflict occur.

To support these types of operations within the Bot Framework it should be as simple as updating the underlying BotState class. The operations would be encoded in a $changes property off the root of a state object and the BotState class can apply these changes to the original state object before saving. Should the save fail due to a eTag collision, the BotState class can re-read the object and re-apply the change using an exponential backoff strategy.

To prevent current code or components from having to be modified, the BotState class can compare the diff between the original state object and with the object being saved and convert all differences into $set operations. Doing so will essentially preserve the current last-writer wins strategy even though we're now attempting to maintain concurrency. Any new code can be written to be smarter where it makes sense.

Component Impact

Customer Impact

Customers shouldn't see any change given that the current last-writer wins strategy is maintained.

Tracking Status

Dotnet SDK [TODO]()

Javascript SDK [TODO]()

Python SDK [TODO]()

Java SDK [TODO]()

Samples [TODO]()

Docs [TODO]()

Tools [TODO]()

Stevenic commented 3 years ago

One slight refinement would be to not only queue up the $changes but also apply them immediately. This will let developers still run expressions against the updated memory as if they've been successfully applied. When it comes time to look for legacy $set operations you just ignore any paths that are referenced by an operation in $changes. 

This basically means that by default we're going to check the diffs between the new object and the cached object to see if there are any $sets we should add to the $changes list. We'll then try to save the new object as is. If that fails we then read in the updated object and reapply all of the $changes.

cleemullins commented 3 years ago

Let's assume that a bot doesn't have a high transaction rate (TPS), and certainly not high concurrency rates.

There are two obvious fallouts from there:

  1. Why wouldn't we just make data operations genuinely transactional? We're not worried about database load, certainly not at the max of 1 Transaction Per Second rate that a conversation would see. Our unit of sharding is the Conversation, and conversations are almost always between 1 bot and 1 person. That's a low-transaction environment.

  2. If we were worried about database load, why wouldn't we use 100% standard datascaling methods?

Even better, we should 100% offer a "We're not involved" mechanism by which the developer of a bot can handle this themselves. Not using the IStorage mechanism, but just the "StateIn / StateOut" calls, the same way any data-oriented technology would.

ASP.Net - to use a Microsoft Centric Tech - has far higher transaction rates and offers the "It's your problem" back-door. Other data centric tech takes the same approach.

Stevenic commented 3 years ago

@cleemullins the issue is we don't have a simple way for developers to code their re-try logic. You could coble something together in C# but for Composer or PVA it's just not possible.