xi-editor / xi-editor

A modern editor with a backend written in Rust.
https://xi-editor.io
Apache License 2.0
19.82k stars 696 forks source link

Multi-device editing/syncing with CRDTs #250

Open trishume opened 7 years ago

trishume commented 7 years ago

This is a tracking issue/roadmap/plan for how I'm going to proceed with adding the ability to synchronize documents. The tasks below correspond approximately to PRs I plan on writing and the order I plan on doing them.

Edit 2017/08/04: πŸ“ πŸ‘ I wrote a detailed document describing the CRDT

Edit 2017/05/04: Add more detail. Edit 2017/05/04: Add ideas for optimizing Edit 2017/07/12: Heavily refactored list to reflect what actually happened, include links to more PRs, and update plan

cc @raphlinus

xi-rope refactoring

Prior to starting this project, the rope and mini-CRDT data structures were mostly undocumented, lightly tested and used some representations that didn't have good time complexity or didn't fit well with turning the mini-CRDT into a full CRDT. The first step of the project was to refactor and replace some of the existing data structures and algorithms and comment and test them.

Fuchsia last-write-wins prototype

As an initial use case of this functionality, implement syncing using ledger for fuchsia/xi.

CRDT Merge operation

Implement the merge function on Engine for histories not involving undo.

Integrate CRDT into Fuchsia

You can now open Xi on two Fuchsia devices and type into both of them concurrently and they will synchronize and reach a consistent state shortly afterward.

Documentation

Improve CRDT robustness and functionality

Make it fast

The above will have terrible awful scaling properties and will slow to a crawl in many use cases. The next step is to figure out ways to improve the CRDT representation and operations to have better time and space complexity. Some of these ideas are well thought out and are clear wins, others are more speculative half-baked ideas (marked by a "πŸ€” ").

trishume commented 7 years ago

I discussed this a bunch with @raphlinus today and now have a much better understanding of the whole CRDT model and how to proceed. I've included some (very high-res page-bloaty) pictures of the whiteboard from our discussion along with some quick notes. I've edited the main issue body based on the discussion with new items.

cc @cmyr @rkusa since both of you expressed interest in CRDT stuff

Example of a revision history, right column is the deletion set at every revision, identical concurrent deletions (last two) necessitate multi-set (each deleted index also has a count) for reversibility so that undoing/reversing one doesn't un-delete a multi-deleted character. Things along the bottom are state in the engine for the head revision: img_20170504_140057 top right is the contents of a revision currently: img_20170504_140104 example of a non-trivial merge (bot-left) and half of a diagram showing correspondence between ops (which include the inserted or deleted string) and revs (which only include index sets and rely on text and tombstone ropes stored in the engine for the head revision): img_20170504_151728 other half of diagram, plus drawings of how one might efficiently represent histories and ropes as key-values so that only a limited amount of keys/data are changed on update: img_20170504_151721 example of correspondence between deletion set and function from index to element, with set of non-primes corresponding to function of nth prime: img_20170504_140124

rkusa commented 7 years ago

Thanks for the update!

I have a question to the first image. Why is the deletion set at line three (ab ins={1} {1}) {1} and not {2}?

Another question regarding the tombstones. Since they are not part of the revision, we only have the most reason tomstone string, right?

I've added the tombstone state to each line of your first example (and also added an additional starting line for the sake of my following question).

(1) 123   del={0, 1, 2}   {0, 1, 2}
      text = "" tombstones = "123"
(2) a     ins={0}         {1, 2}
      text = "a" tombstones = "23"
(3) ab    ins={1}         {1}
      text = "ab" tombstones = "3"
(4) axb   ins={1}         {}
      text = "axb" tombstones = ""
(5) ab    del={1}         {1}
      text = "ab" tombstones = "x"

Is this the correct way of how tombstones are going to work? If yes, when I undo all of these revision ((5), (4), (3), (2) and (1)), how do we know that we have to restore 123 (because 123 it is not in the tombstone string anymore)?

trishume commented 7 years ago

The leftmost deletion set, also known as from_union or the interleaving, is the characters deleted to get the final string. To be clear in the whiteboard diagram, the rightmost column is things that can be computed at the current time after all the revisions based on information at the bottom rather than stuff stored in a revision or computed at that time. So since the union string at the bottom is "axb", the deletion set to get "ab" is {1}.

Let me rewrite your example to be correct. Below the first line is in the revision, and the next line is the contents of Engine just after that revision is made. I included both the text,tombstones representation and the union representation, but in reality only one would be included. For the union, from_union is just the characters to delete to get text, and for text,tombstones you can get the union by using from_union to determine an interleaving. You also need from_union to figure out indexes to put in revisions since the indexes are based on the union string. The text,tombstones representation is an optimization of union that is more complex but also more efficient. I also included back_computed_deletions_from_6_union which is not something known at each time but something that can be computed at the very end, possibly for undo processing or something, it corresponds to the leftmost column of the drawing, I don't know if it is actually useful, it's just something we talked about to make sure I was understanding it right.

()   before any revisions are made
      text = "" tombstones = "" union="" from_union={}
      back_computed_deletions_from_6_union = {0,1,2,3,4,5}
(1) ins={0, 1, 2}
      text = "123" tombstones = "" union="123" from_union={}
      back_computed_deletions_from_6_union = {0,1,2}
(2)  del={0, 1, 2}
      text = "" tombstones = "123" union="123" from_union={0,1,2}
      back_computed_deletions_from_6_union = {0,1,2,3,4,5}
(3) ins={0}
      text = "a" tombstones = "123" union="a123" from_union={1,2,3}
      back_computed_deletions_from_6_union = {1,2,3,4,5}
(4) ins={1}
      text = "ab" tombstones = "123" union="ab123" from_union={2,3,4}
      back_computed_deletions_from_6_union = {1,3,4,5}
(5) ins={1}
      text = "axb" tombstones = "123" union="axb123" from_union={3,4,5}
      back_computed_deletions_from_6_union = {3,4,5}
(6) del={1}
      text = "ab" tombstones = "x123" union="axb123" from_union={1,3,4,5}
rkusa commented 7 years ago

Now, it makes sense. Thanks a lot for the detailed answer!

erlend-sh commented 5 years ago

Might this become compatible with the CRDT version control system Memo that @as-cii and @nathansobo are working on in xray?

cmyr commented 5 years ago

@erlend-sh These two projects aren't particularly related, so I don't think it makes too much sense to think of compatibility; however my understanding is that Memo is designed to be a standalone system that exposes an API that can be used by any consumer, so supporting it in xi at some point in the future should definitely be possibly. It's certainly an interesting project, and I've been following along with interest. :)

ul commented 4 years ago

Is there any ongoing work for this project? I'm especially interested in "Support Undo operations in the CRDT". Is anyone having a deep Engine understanding keen to work on it with me? I also could offer a bounty for this implementation.