Capability to create safely root data stores in attached container mode

vladsud commented 3 years ago

FF lacks capability to resolve conflicts on creation of data stores or DDS.

Canonical scenarios include:

Change in document schema by container developer that requires addition of new root(-y) data store. Client on demand (after opening file) determines such new data store needs to be added and attempts to add it. The only safe way to do so today is by using ConsensusRegisterCollection. Which means container should have it before such need arises (I'd argue it pushes the need for every ContainerRuntime or root data store to have it by default to protect against future requests).
Some expensive object needs to be created on demand, on as needed bases.

Similar problem exists for DDSs as well, but we see our customers' needs mostly in data store space.

Note that ideal solution moves this problem to a merge on setting map/directory key to be a handle to newly created object (data store) and providing capabilities to clients to react on merges. However last-writer-wins merge policy is not ideal for such changes and CAS (compare-and-swap) is more what is needed, pushing toward ConsensusRegisterCollection type of solution.

Sub-issues:

[x] https://github.com/microsoft/FluidFramework/issues/7998
[x] #7948
[ ] https://github.com/microsoft/FluidFramework/issues/8698

vladsud commented 3 years ago

Here is possible escape route (not ideal, but will solve 99% of existing problems):

attach op for data store no longer results in container failure on name collision. Instead it results in no data store created (second time), but container moves on.
A peer to today's IFluidDataStoreContextDetached.attachRuntime() is created that waits for attach op ack and returns success or failure (based on above)
IContainerRuntime.createDetachedRootDataStore() is changed to only work in detached container mode (today, "detached" in method name refers to detached context, not container state, a bit confusing - see # 7 as well).
IContainerRuntime.createRootDataStoreLive is added (similar to # 3) to create root data stores in non-detached container state. It returns context interface that has only # 2 (i.e. ability to wait for attachment and receive success of failure).
PureDataObjectFactory.createRootInstance() would be modified to chose between those two paths - # 3 or # 4 depending on state of container (possibly split API into two to be more explicit, especially that in future I expect one to be synchronous, and one - async).
No changes for IContainerRuntime._createDataStoreWithProps() API - it continues to be marked as deprecated.
IContainerRuntime.createRootDataStore() is removed from interface, stand-alone utility (for easier use) can be added to go through # 3-4 (and thus be split with these changes into two APIs). Or maybe - do not provide it, pushing people toward Aqueduct.
- This will allow reuse of the name, i.e. createDetachedRootDataStore (without all these changes) should be called createRootDataStore

I do not think it adds too much complexity RE long term support. I want to disallow (in some future) ability to create root data stores in non-detached mode (i.e. steps # 2, 4, and part of # 4) with or without this change and move all collisions to be on changing DDSs values. I.e. even without these changes, I want to eventually deprecate creation of root components in non-attached state, today it works by luck (if it does), with these changes it will work correctly (though require active connection) until we could deprecate it.

anthony-murphy commented 3 years ago

I like this solution. I actually prototyped something similar in the past: https://github.com/microsoft/FluidFramework/pull/4247

anthony-murphy commented 3 years ago

i would consider less methods, and drive the differences by config. I think the default should be slow but correct, but i wouldn't block the fast path (detached create on attached container), it would just require outside coordination

curtisman commented 3 years ago

I agree that this is a good solution to unblock. However, introducing "temporary" API/function does have a high risk of it sticking around for a long time. I think we should think thru the following:

If this functionality stacks around, what are the implications?
In order to get off this solution, we need to have an alternative. Current thinking is that "merging" is the long term solution. We should at least think thru what that looks like and how it solves the problem, to give us confidence that we can move off this temporary solution.

Also, given that this is temporary solution, I'd expose it as little as possible. We should assess whether we really need to have it in Aquaduct right now.

anthony-murphy commented 3 years ago

I don't necessarily see this as temporary. this is just race creation, which seems fine even long term. You kick off a create, but don't get anything until the race is resolved, and then you configure the object via ops.

this can be augmented this with 3 way merge, where you give back a local copy, race on resolution and then merge that (still producing ops). in my mind this pattern is mediated on the existence of some merge api.

this has some advantages over handle racing, as it won't produce unreferenced objects, the disadvantage being local state isn't persisted to the server. i also still don't have a clear idea how to implement handle racing.

vladsud commented 3 years ago

I've chatted with @DLehenbauer to understand better what is available in other experimental DDSs and in external solutions. Some number of themes:

In general if application data can be structured as a tree (weather it's single DDS, or not), framework could do more in regards to merging, including creation (of sub-trees) automatically.
- Note that cycles might be present in in the graph at a different logic level (not visible to DDS / framework), that could cause trouble, so usually is avoided, but app layer may decide to allow it and carefully manage it.
Some solutions rely on preserving history and letting app interpreting colliding changes.
Opportunistic (i.e. not using synchronization / blocking) solutions rely on rejecting conflicting change (from how all but one clients see it) and letting "looser" deal with it on realization of "nack", i.e. merge and submit update.
In couple places we do see CAS or CAS like implementation on DDS exposed as unique API (replace()) while same DDS supports opportunistic set() API. That way it's clear what's the intention of the caller.

In general, I think the feeling here is that

the app that is uniform (i.e. no components that are developed in isolation) could rely on something like single tree DDS that helps manage problems raised here.
For other cases, we would need a tool box of solutions, and adding capabilities like CAS on map or concurrent creation of root data store are pieces in this toolbox. We will evolve toolbox, but that may not mean we will deprecate earlier built tools, even if there are better / more holistic approaches built later (like 3-way merge), as their scope and when they are applied would be conditional, i.e. useful / right tool under certain conditions, not always.

This is, of course, my interpretation (and somewhat my narration) based on discussion with Daniel, Daniel - please correct me / add your point of view, please.

vladsud commented 3 years ago

To add to earlier replies by Curtis & Tony: Leaving API shape question aside (i.e. where and how it is exposed), I think it comes down to us pushing engineers toward a solution vs. giving them multiple solutions.

For example, pushing toward 3-way merge might be good in terms of us unifying our focus and having one way of doing things. At the same time that will likely have limitations (for example, we may say - we support it only for tree document structure, not graph with cycles) and/or it may have much higher cost for developer to use it.

Race-based APIs are much simpler to use (are much more isolated RE impact on whole app), but have obvious cons (does not work offline, latency).

I think it boils down to this, and overall it's case by case (i.e. in other areas I want us to push developers away from patterns that usually are harmful). Here, I feel like choices are not harming at very high level (synchronization vs. opportunistic model changes), but might be harmful at layer below (where and how synchronization is exposed - only as CAS on map/directory, or root data store creation, or both).

vladsud commented 3 years ago

After a bit more discussions, I want to pivot back to CAS on the map/directory (via having dedicated method). We would need to spell out how it works and how it overlaps with other operations (not using this method) on same key, of course. This will allow us to disallow creation of root data stores in attached state, i.e. push people toward single tool across wide range scenarios where they might need to synchronize creating flow.

Solving it as originally described (at container runtime level) would definitely remove pressure for existing cases, but eventually we would need to have a solution at data store level. Similar approach (RE using channel creation / channel name as a pivot) would work here as well, but it's not clear (yet) if it would solve all the future scenarios, and it's possible we will come back to having a need for map/directory to support it.

Note that I believe having CRC (ConsensusRegisterCollection) for that is not good answer, as it assumes creating it beforehand (before developer realizes he/she has an issue of delayed creation) and it makes it harder to manage set of references to other components / DDSs split between directory and CRC.

If this direction does not pan out, then I want to implement original proposal to solve it at container runtime / root data store level.

@ChumpChief - curious to hear what you think.

anthony-murphy commented 3 years ago

i was thinking a bit about this space, and wanted to share some rough ideas I've had, which i'm not convinced are good yet, but could spark good discussion.

Something i haven't been a huge fan of is how all solutions seem to only support very specific dds, or need very specific dds support, like use a ConsensusRegisterCollection or add CAS to map keys. As vlad points out i may not have a ConsensusRegisterCollection, but also i may not have a map, i could only have a sequence, or some other dds.

For the sake of discussion i'm going to call my proposal a contingent op. The idea is that you create a normal dds op, or potentially ops, but those ops are nested within another op, which specifies meta data about how they should be applied. My thought is that these ops, would be sent by and to data stores who would own defining, and interrupting the metadata, and choosing if the nested ops should be applied. While the data store own it I would imagine us building some common handlers, for map key, or shared string markers to implement things like CAS in a common and reusable way.

ChumpChief commented 3 years ago

I think this conversation generally seems to be converging on "What preconditions are OK to assume for the developer (CRC, map, nothing)?" I don't have a strong opinion here, other than I feel like the await-for-agreement approach is sufficiently solved IF the developer already has a CRC. And so maybe the solution doesn't have to be "enable their existing map to solve the problem" or "enable any data setup to solve the problem" but can instead just be "help them get a CRC if they don't have one".

FWIW I think it's already possible to implement "messy first-writer-wins" on a map/directory via carefully named keys. Would be a little bit more op overhead and eventing considerations, but seems functionally doable without API changes and might be good enough to "help them get a CRC if they don't have one".

anthony-murphy commented 3 years ago

I really like this articulation "What preconditions are OK to assume for the developer?"

I think the other dimension i think about is granularity. If we use a dds based solution what scope or granularity is reasonable? per container? per data store? per dds? per collection item? This has impact on things like conflicting identifiers, and summary blob counts.

DLehenbauer commented 3 years ago

Yeah... I was trying to figure out what scenario we're discussing where the data object can't just conjure up a CRC in initialize component and set it's handle in it's root map, matrix, or whatever?

@anthony-murphy - Your 'contingent op' scheme is very similar to what Whiteboard implemented in the SharedTree DDS, which I think is a pretty pragmatic way to let apps get involved in conflict resolution without tackling the complexity of authoring a DDS.

The SharedTree system in a nutshell:

The app can combine ops into atomic groups (like the atomic groups we have today).
Groups carry a list of app-specified preconditions (e.g., at the time I inserted this node, the parent had zero children.)
If after sequencing the preconditions are no longer true:
- the group is rejected and remote clients simply ignore it.
- the local client reverts the group and invokes the application to retry/repair.

Like the scheme Tony proposed, the application captures command metadata on the op group to help it later process rejections (e.g., 'this op group was a "destructive" 1-time sort command applied to children in the range x..y').

It would be interesting to explore hoisting the SharedTree constraint/transaction system to the container level to enable cross-DDS transactions, but I'm not sure I'd go there yet to solve the issue of the app getting it's hands on a CRC or SharedTree.

vladsud commented 3 years ago

"contingent op" (the way it described initially) would require support from every DDS. For example, DDS like map may not keep prior state before local change, because it knows that last-writer-wins policy ensures it does not care about such state while local change is in-flight (not acked). So it's a big change in expectations for all DDS writers. That said, it allows local transactions, i.e. implementation of a bunch of local changes (pausing sending them) that later could be reverted if something went wrong. This is one of the asks for consideration from devmain/Word, as most of the devmain apps today have that native capability in their data model (either direct, or indirect through shallow doc cloning and merging changes back to original doc). So if we decide to implement it (and implement through such construct), then it might be easy / good addition.

I'm not big fan of going CRC DDS route (for that problem) as it makes dev story more complicated for end user. Now some of object references are in a root map/directory, and some in root CRC. Fundamentally, it's same key namespace (we can think of it as map1/key1 & crc2/key2), with different set of operations allowed on subsets of that namespace. There are clear advantages of this approach (no need to define how operations on same key from map & crc world resolve when applied to same key), and clear disadvantages (more DDSs, more concepts for developers, more (smaller) blobs).

Tree DDS approach is equivalent IMHO to map/directory with CAS - it's single DDS that can do it all, without requiring the rest of the world to be more complicated.

If we go with pre-conditions (in either of the forms), I'd worry about letting user code to decide if pre-condition was met. That would likely lead to bugs where two clients react differently and eventual consistency is broken. So I'd look for some limited set of expressions we (FF) support. It maybe bigger than checking single key, but it's a limited set. We might end up with a lot of complexity here down the road (of supporting whole (new) language to do rich expressions), but I'd rather have control.

vladsud commented 3 years ago

For what it's worth - I was poking at somewhat unrelated #6571 (DDS summarization and asynchrony / unifications of summarization flows across layers) and I think we should move away (for most part) of attachment of fully initialized objects (data stores & DDSs). That also would reduce pressure in this area, as creation of object really becomes declaration of the type / schema, and thus could be mergeable. It does not solve fully this area (named objects creation in attached container where initialization of such objects is required), as named objects are visible on creation before they are initialized (i.e. problem shifted to different domain, but not addressed). But it's not very clear how big this area is, and if it could be solved via other means (for example, allowing objects to provide their initial state that all clients initialize with without any ops)

More thoughts in internal doc, will move to public forum once I have some initial feedback

vladsud commented 3 years ago

@andre4i - this is next item on your list :) Please take a look at this issue, document above, and let's chat.

Current plan is as follows:

Objects are attached to container on creation immediacy, not when their initialization is complete.
- this should have no impact for unnamed objects, as they should not be accessible from other containers yet (their name is random and unknown to others, until object is referenced through handle somewhere else).
- For named objects, we will add ability to assign alias name, i.e. delay giving well-known name until right time (from caller perspective). "Assign alias" === "compare-and-swap operation on name", i.e. clients race and only one wins this race. Usage of this capability, ideally should be more of an exception than the rule (read more below).
  - Aliasing will require being connected, however client can move forward in disconnected state with safe assumption that object with such name is guaranteed to be in container (it might be different instance though, if aliasing fails).
Move toward "creation" === "schema definition"
- allow data stores & DDSs to control what happens when two "creates" collide. In most cases we want that to be no event, i.e. "creates" to coalesce. Note that we need this only if we allow named object creation. If we move toward all creations are unnamed and aliasing is used to give a name, then we can punt on all of that.
- Such flow can be used for creation of named objects IF their initialization is recursive creation (defining schema), and nothing else (i.e. no population of initial values in DDSs).
- In future, we will could allow objects to define their initial state that is implicit, i.e. all clients start with this state on creation, and thus there is no risk of such initial state overwriting future user content that is written into such objects. This assumes that all objects ensure all clients have same idea about initial state (eventual consistency) and if any changes are required in such initial state, that results in version bump of object type, and as result, objects need to reconcile 2.a. (i.e. creation from two clients with different versions of object factory can no longer coalesce seamlessly).

In terms of scheduling, aliasing is # 1, as it's stand-alone task, solves immediate problems, and enables other flows.

andre4i commented 3 years ago

https://github.com/microsoft/FluidFramework/pull/7486

vladsud commented 3 years ago

@andre4i - this is currently September item, and your October list is pretty empty. I assume what is missing is break down of this item into actionable steps and assigning to right milestones. Could you please do it before October sprint planning meeting today? I'd guess that same needs to be done for any other work items on your list that are in progress (like implicit batching).

andre4i commented 2 years ago

https://microsoft-my.sharepoint-df.com/:w:/p/andreiiacob/EVCLe2ZL1sFCsCV8RUaFLj8BKNCdtsH7hrR_hAYUzJ4c6g?e=r2q3mJ (internal)

andre4i commented 2 years ago

https://dev.azure.com/fluidframework/internal/_workitems/edit/143/

microsoft / FluidFramework

Capability to create safely root data stores in attached container mode #6465