SharedTree: API layering discussion

microsoft / FluidFramework

Library for building distributed, real-time collaborative web applications

https://fluidframework.com

MIT License

4.72k stars 532 forks source link

SharedTree: API layering discussion #8989

Closed DLehenbauer closed 2 years ago

DLehenbauer commented 2 years ago

Introduction

When it comes to interacting with application state, there are two "flavors" of API in common use:

Imperative style APIs where the application reads/writes data via a mutable tree of observable objects
Functional style APIs where the application reads immutable snapshots and queues edits for a future revision

Imperative APIs with mutable objects tend to be more concise and approachable. Mutable data can also offer efficiency advantages due to reduced copying. On the other hand, the state isolation offered by functional APIs with immutable objects offers advantages when composing concurrent/reactive systems.

More relevant than their strengths or weaknesses is the fact that the choice of an imperative or functional programming model has ripple effects in an application's architecture, creating friction and inefficiency when attempting to shoehorn a functional data structure into an application using an imperative style, or vice versa.

This is problematic given SharedTree's charter of being the "one easy choice" for 90% of Fluid scenarios. Therefore, this document begins with the premise that the SharedTree DDS will need to expose a layering where the application or intermediate framework can define the application-facing representation of the tree data.

Approach

It can be challenging to imagine a "tree data structure" that does not itself build the tree of nodes exposed to the user. To see how this is possible, it's useful to start with a high-level description of how a DDS works:

The user describes the changes to be made to the DDS.
The DDS submits the change to the service that broadcasts it to clients.
Each recipient adjusts the change to account for knowledge the sender didn't have.
The change is applied to the local copy of the data and change event(s) are raised.

Note that this description works equally well for both the imperative-style Property DDS and the functional-style Whiteboard DDS, with the only interaction with the user's data occurring when the change is applied.

Some notes:

In imperative-style APIs, building a change description to submit to the Fluid service happens behind the scenes as a consequence of calling APIs like .set() or .commit().
Both the mutable and immutable APIs raise a changed event with a description of the adjusted delta that could be used to maintain/synchronize the reified tree of nodes externally to the DDS.

Given the above, the following minimal API is almost sufficient for an application or framework layer to synchronize a local tree of either flavor with the state represented by the DDS:

interface ISharedTree {
    // Submits the given changes to the user's local tree (and submits to Fluid service)
    apply(change: IChangeSet);

    // Informs the application that the local tree has changed in response to processing
    // ops from the Fluid service.
    on("change", (change: IChangeSet) => void);
}

The one missing piece in the above API is that the application needs a way to bootstrap the initial state of their local tree on connection. To maximize code-reuse, this API should probably return the initial tree state in IChangeSet form (i.e., an IChangeSet that contains only a single insertion with the tree contents).

interface ISharedTree {
    get(): IChangeSet
}

Discussion

The API sketch above is intended to illustrate a general approach and shouldn't be interpreted as a literal proposal. For discussion:

Impact of partial views and branches
Alternative means of communicating changes
Separate reader/writer capabilities (ala Tiny-Calc)
IChangeSet is sufficiently complex that we probably need to provide a builder API:
- Does the builder implicitly add reversibility data, or does the app provide?
- Does the builder return IChangeSet in object or binary form?
- Does builder verify correctness of change wrt. current context?
Does the "pun" of returning tree data wrapped as IChangeSet carry its weight?

Examples to work through:

JS Proxy-like
- Readonly w/separate build API
- Read/write w/reified tree as implicit context
Applications that want to lazily reify portions of the tree on demand (both sync for pre-fetched chunks and async from demand fetched chunks)

milanro commented 2 years ago

Hi @DLehenbauer, I am very interested in the discussion concerning reversibility data. It is an interesting idea to leave this responsibility to the application because otherwise, for some kind of actions (delete, modify), the builder / dds would have to know the current state in order to be able to generate the inverse. On the other hand, the validity of the inverse would then be solely in hands of app. I am also interested about the future of those inverses (M2). My understanding is that reversibility can work well even in the concurrent environment when we use simple operations like Insert, Modify and Delete. But how do you see the reversibility in case of complex, nested, concurrent operations using combination of move-out, move-in, delete, insert, modify, revival and clipboard actions. I wonder, how challenging would it be to compute the reverse in such case.

CraigMacomber commented 2 years ago

I think the DDS needs to know the tree structure

@milanro Since the dds has to know how to do merge resolution (or more specifically as least deal with reordering or concurrently edits including things like conflict detection), as well as write summaries, I have been assuming it would have to know the current state, it just might not store it in a format thats optimal for tree traversal.

How whiteboard and shared-tree do something like this design

Whiteboard + shared-tree currently roughly follow a similar design: whiteboard maintains a copy of the part of the shared-tree that it cares about in a format optimized for traversal, and integrated into the application's invalidation/observation tracking system.

shared-tree does still expose the data as a traversable tree, but the performance of traversing this tree is not optimal (it's still pretty fast, and we plan to improve it, but its not the fastest possible tree traversal due to the b-tree lookups).

On the subject of functional vs imperative style, whiteboard takes advantage of the stable ids shared-tree provides to its nodes to emulate an imperative editing experience when on top of a functional implementation. This approach does incur a performance cost, but that cost is just overhead for edits made by the local client, which should not be the bottleneck in most cases. Usually, the limit would be remote edits (we need to support higher remote than local edit rates, and local edits are easier to backpressure as well), or tree reading (application load and update after a batch of changes).

Additionally, the functional style API makes it easier to reason about multiple states at once. One use-case would is async edits which want snapshot isolation while running: it would be possible to allow this while having a view of the app that can still update. Viewing history or other branches side by side would also be practical. Another use-case would be summaries: right now they use a second copy of the DDS, but with shared-tree's functional copy on write internals, it would be easy to do a non-blocking (async) summary of a particular version of the tree without blocking updates to the app.

Thus, I think a simple change-based API like @DLehenbauer described is good, but we also want a way to look-up individual nodes and their children (so the app does not have to keep the full tree in their view, and can load new parts on demand, and also solves the initial load problem). Then I think we want to provide some helper libraries for building/viewing/editing trees with nice APIs based on that. If the user wants imperative editing styles, and it would be expensive to support directly (like in whiteboard's version), we can defer the cost of that until a local edit is actually being performed (ex: via a proxy based editable view).

Additionally, the helper library we provide for viewing a tree from a shared tree can also include a layer to schematize the data (provide a schema aware typed interface, handle out of schema data etc).

Thats the vision I've been working with and building withing whiteboard anyway.

Summary

This is the vision I have been working on with whiteboard. It's not the only approach (and maybe not the best), but I think it works well so far, and is work detailing here as a possible approach:

Use a functional style immutable representation inside shared-tree, optimized for change application ('forest' in the current code), and expose basic tree reader and change subscription APIs on top of that ('checkout' in the current code). Provide a separate library for providing an efficient reified view of the tree (lazily built as needed), with optional layers to provide:

schema aware typed API surface
out of schema data handling
imperative editing experience
integration for various frameworks' invalidation systems (ex: support for exposing to React, whiteboard's internal platform etc).

Currently whiteboard has such a library and factoring it out of whiteboard into a standalone thing (called 'schematize') is a work in progress.

WASM note

Designs like I proposed, and even more so for @DLehenbauer 's proposal make it practical to have one copy of the data inside the shared tree, optimized for its needs, and a second copy (of the subset the app cares about currently) in the app, communicating a simple change/inval/delta protocol. This approach happens to lend itself very well to implementing the actual shared-tree internals in WASM. Managing the tree internals in WASM is attractive for many reasons (more efficient memory use, better binary format handling, faster custom data-structures, multi-threading (ex: background summary generation)). I have experimented in this area and think it's worth trying.

milanro commented 2 years ago

@CraigMacomber Thank you for sharing these brilliant ideas. If I understand well, both partial view of the tree and even history could be retrieved from the b-tree+ storage which is very effective because it allows efficiently reusing unchanged parts of the tree within each new version. In other words, If I have giant tree of millions nodes and I change one node, it might result in maybe adding 4 b-tree nodes (new root, affected internal nodes and leaf node). I understand, that this way, it is possible to checkout any historical (possibly partial) state very efficiently.

I wonder whether beside checking out the state, we would be able to retrieve also deltas from any point of history. The use-case here is that our application also has persistence. Let's think that we are pumping the data from shared tree to our elastic search database where we can search nodes based on text ranking and various other attributes. Let's say that we remember the actual version applied to our elastic and after restart of our applicator, we want to start where we finished. The current philosophy of fluid is to compute summaries and get rid of old operations asap and we could miss some operations. The applying the full summary is of course not, what we want, it would mean deleting all our persistent data and creating them new. We would be happy to ask the tree: give me operations / deltas from this version.

One other nice feature is having deltas and inverses. This way we could navigate history there and back without checking out each state as we navigate and updating the full application view. Of course, the b-tree+ index would be used to initialize the start point.

I have a few more concerns how the complex operations would be communicated to the application but I suppose that this is discussion for M2 because M1 will handle only simple Insert, Modify and Delete which are simply applicable to the app structures. I am looking forward to these discussions once they are in scope.

CraigMacomber commented 2 years ago

If I understand well, both partial view of the tree and even history could be retrieved from the b-tree+ storage which is very effective because it allows efficiently reusing unchanged parts of the tree within each new version. In other words, If I have giant tree of millions nodes and I change one node, it might result in maybe adding 4 b-tree nodes (new root, affected internal nodes and leaf node). I understand, that this way, it is possible to checkout any historical (possibly partial) state very efficiently.

Thats the idea, though there are some extra costs. You might need to reapply edits from between summaries, which requires saving those edits in the summaries, and applying them may involve downloading unrelated parts of the tree. There are ways to make this efficient, but its non-trivial (it's very doable, but there are some design trade-offs to work through regarding applying edits to an incomplete tree, particularly regarding conflicting id detection). There are also some open problems around how to provide a good experience for actually deleting things in a destructive way (the same as deleting something from a git history, ex: before sharing the repo with some external group).

I wonder whether beside checking out the state, we would be able to retrieve also deltas from any point of history.

Yep. The existing experimental version of shared tree already supports this. You can recreate the historic states and compare them to compute their deltas. If we implemented the efficient historic state access described above, this would be quite fast (though still async unless you have the history loaded locally: we do support synchronous access to historic states from the current session though, which whiteboard uses for undo/redo).

I have a few more concerns how the complex operations would be communicated to the application but I suppose that this is discussion for M2 because M1 will handle only simple Insert, Modify and Delete which are simply applicable to the app structures. I am looking forward to these discussions once they are in scope.

If we used deltas from the internal B-trees generating by comparing two states, we get a list of new, modified and removed nodes. This is orthogonal to the editing API, so it won't have to change even if we add more advanced editing primitives in the future. Exactly how we expose to the app the contents/changes of the modified and new nodes is still up for debate. I suspect we will want a simple function-based getter API for the common case, and an optional one you can use to read nodes in their internal compressed format for efficiently handling of large amounts of uniform data.

For an application like the one you describe, you could checkout the last version it was synced to, then skip to the newest revision, and the shared tree could produce the proper invalidation (based on the delta), just like it will when real-time edits are applied. A nice thing about the comparison-based delta approach is you can generate deltas between any two views allowing navigating history or viewing live edits with the same update/invalidation logic.

milanro commented 2 years ago

You can recreate the historic states and compare them to compute their deltas.

Indeed, I see that the compare of 2 states, if implemented as a part of tree library, will produce stable and immutable deltas which application can easily consume. What do you think about a scenario where huge tree is processed but the changes are rare comparing to size of the tree.

Let's use our hypothetical elastic which is interested in all changes in the tree. The engine would still need to checkout the both versions of the whole tree and perform the full compare only to find out, that only a tiny part of the tree was changed.

Supposed, the deltas between versions are stable and also additive, what do you think about the idea of saving those changes solely between two neighboring versions. The compare of 2 states would have to be executed only once, on receiving the sequenced edit operations from fluid. That time I suppose that the partial state, which is relevant for computing the deltas is checked out anyway and the partial compare would be sufficient. The inverse change generation could also be saved which would not only allow us to apply deltas to our elastic but also enable us to navigate through history in our app very quickly. Of course, this approach still has challenges like how to support partial checkout and receive only changes for the partially checked state, how to optimize the changes (lazy) retrieval etc. Is this something what would come to your consideration?

DLehenbauer commented 2 years ago

@milanro - The new changeset format has the additive property you're looking for: given two consecutive changes C[3] and C[4], you can compute a new squashed change C[3..4] that accumulates the effects of both C[3] and C[4].

To support partial checkouts, it's important to us that computing C[3..4] does not require downloading any tree data. However, computing C[3..4] may require caching some preceding changes if the clients that generated C[3] and C[4] weren't fully up to date at. (i.e., we may need C[0..2] to adjust C[3] and/or C[4] so they have a common basis before combining them.)

Fortunately, the service can caps the size of this cache by requiring clients to "catch up to" a minimum sequence number before submitting changes. In an extreme case (e.g., someone sends a really huge op), the service can advance the minimum sequence number to the current sequence number, in which case the "really big op" wouldn't need to be cached after being applied.

milanro commented 2 years ago

@DLehenbauer thanks for the clarification. I went through the changeset design which already exists under packages/dds/tree and it is really detailed. I understand the squashing, I am not 100% following the need for caching preceding changes when clients are not up to date, would it mean that C[3] might happen when C[2] is not yet loaded on the client and the final squash will need all of them (does this mean that even C[2] is concurrent)? Anyway, the other question is, whether those changes will be persistent which means that when checking out really old state, we could reapply those old changes to move up and down without checking out next / previous states (this is the use-case when my app has persistent state, for example elastic DB (accidentally switched off a few days) and I want to apply only changes but our tree is huge and changes are tiny).

dstanesc commented 2 years ago

Related to @milanro comments, I would like to sketch the higher level scenario we believe to be common in the context of the SharedTree (ST) based applications.

It is our experience from the past that most of the data management use cases are typically I/O bound (however differently on the write and query side) and there is always a denormalization strategy which addresses the performance problems. In a CQRS like pattern the ST would be the write model, but query cases, for efficiency reasons, would have to be served inevitably by MaterializedViews (MV) derived / projected from the data stored in collections of STs.

Now such MVs would reside outside Fluid and have very different SLAs. There are 3 design aspects we believe to be ubiquitous:

Fluid high availability (HA) should be sufficient and the only prerequisite to ensure convergence to (eventual) consistency across collections of STs and including multiple MVs. This needs to happen independent of network outages and even software bugs to be corrected, meaning extremely variable MV availability
Efficient (reduced I/O, ideally not diff based) MV incremental updates. One possible pattern is that MVs would own/track internally the version of a synchronized ST state and be able to request the delta/change between its known/old ST state and current ST state. Even more, some data management use-cases need to capture multiple revisions of the data and it is very good possible that a particular MV will have to track ALL changes (rather than the squashed change).
Highly scalable and resilient MV indexing (million of STs or partial ST fragments) by design. A (persistent) notification mechanism, accounting for the fact that MVs may be unavailable at the notification time. Only when notified the MVs will request the deltas from a particular ST

DLehenbauer commented 2 years ago

Hi @dstanesc - Thank you for the context. We also have a similar system in mind where all writes are funneled through Fluid, but then asynchronously replicated to additional read-only data stores that support specialized queries and views (or just for increased scale.)

It's fuzzily captured as M7/M8. I'll drop a note in #8273 to remind myself to come back to your comment and borrow from your description as we fill in more details.

DLehenbauer commented 2 years ago

@milanro - I'm pretty sure the changeset format and our rebase algorithm supports what you want to do (esp. since we want to do the same kinds of things.)

You can take a consecutive sequence of changesets C[i] ... C[j] and squash them into a single self-contained changeset Cij that can be trivially applied to a tree in state S[i-1] to "fast-forward" it to state S[j].

However (as you guessed) the construction of Cij may require changes prior to C[i] due to concurrent editing.

milanro commented 2 years ago

Hello. I would like to discuss the ability to move the existing applications based on Property DDS to Tree DDS. I believe that this is also a relevant topic at the API discussions.

We are developing number of applications based on the Property DDS which in the future will need to use features that will be implemented only in Tree DDS. Based on this discussion, we would like to achieve that the code migration will be possible and as painless as possible. We would like to know your view how deep support we can expect concerning the migration.

There are 3 API topics relevant for the Code Migration as follows

Schema Definition
Binder Support
Property Classes

Schema Definition

Schema Mapping

Ideal approach : Support the exactly same schema definition
Minimal requirement : Property DDS schema can be fully mapped to Tree DDS schema

Mapping Support

For mapped solution, is some schema converter tool provided?
Support of all built-in types
Support of custom types
Support of types versioning
Support of Type Upgrade Policy : NONE, PATCH, MINOR, MAJOR

Binder Support

Binders are communication layer from Property DDS to the application. This is the way how Property DDS announces changes to application. Application heavily relies on binders. However the binder configuration is hidden from the application layer and that API can be changed. But functionality is essential.

We can register our callback functions on supported events
Supported events : modify, insert, remove, collectionModify, collectionInsert, collectionRemove, referenceModify, referenceInsert, referenceRemove
Binding registration based on the type (types migration is important also for this reason)
Binding registration based on the absolute / relative path

Property Classes

In order to change data in Property DDS there is an API based on various classes extended from the BaseProperty class. Application layer directly talks with objects of those classes, calls methods to create / modify or delete the data.

Could we have such API in Tree DDS?
Or would these calls be mapped easily?

DLehenbauer commented 2 years ago

Hi @milanro - I'm sorry that I don't have a prescriptive answer on migration yet.

We plan to have a preferred tree API and schema language (designed in M2 and implemented in M3) that we will recommend to the majority of SharedTree customers. However, we have a diverse set of customers with different interoperability requirements. It's not yet clear that every customer's needs can be met with a single high-level API.

For that reason, we've designed SharedTree to have a replaceable "API layer" with multiple implementations. This means that one potential path forward would be for you to port the existing Property DDS tree API to the new SharedTree core/engine. This might make a good "plan B" if timelines don't align or the M3 API strays too far from what you can absorb.

I think that in ~3 months there will be enough of the M1 core/engine layers implemented that you could start investigating a port of the property DDS API (if that's an interesting short or long-term direction for you.)

milanro commented 2 years ago

Hi @DLehenbauer - Thank you very much for the info and for giving direction. We will focus on building facade to support Property DDS API when the initial core layers are available. It looks to me that at least types of Property DDS schema would need some representation in Tree DDS so that we would be able to map them some way (as we have active instantiations and we will need to migrate them). What is your view here, could this be supported at the core level?

ghost commented 2 years ago

This issue has been automatically marked as stale because it has had no activity for 180 days. It will be closed if no further activity occurs within 8 days of this comment. Thank you for your contributions to Fluid Framework!