multihack / multihack-web

Realtime collaboration for programmers. (Web Version)
https://multihack.github.io/multihack-web/
MIT License
93 stars 18 forks source link

Make Multihack CCI compliant #34

Closed t-mullen closed 7 years ago

t-mullen commented 7 years ago

So far Multihack has managed fine with optimistic edits (due to low latency, not many consecutive users), but eventually we should move to a proper CCI model.

To be a provably correct realtime collab system, Multihack must meet the following criteria:

  1. Convergence: All users eventually have the same set of documents.
  2. Casuality Preservation: All edits are applied in the same order they are generated.
  3. Intention Preservation: The actual effect of any edit is the same as the intended effect.

Thanks @kifhan for originally pointing out that there's a bit more theory behind this than I originally thought.

t-mullen commented 7 years ago

This gets much trickier with the P2P model. Any help (or links to relevant whitepapers) is appreciated.

CRDT seems to be the way to go:

WOOT paper: https://hal.inria.fr/inria-00071240/document Good article: https://medium.com/@raphlinus/towards-a-unified-theory-of-operational-transformation-and-crdt-70485876f72f Woot's tombstones get out of control. Not a fan of what is essentially a memory leak.

kifhan commented 7 years ago

check this. dmonad summarize ot libs. http://stackoverflow.com/questions/2043165/operational-transformation-library

I'm considering this libs. WOOT could be a good choice too. https://github.com/y-js/yjs

Also you may want to check this one too. Oh you already mentioned. :D https://github.com/google/ot-crdt-papers

t-mullen commented 7 years ago

We need some kind of CRDT model because we can't use the central server to limit concurrency. So that rules out a lot of OT solutions in that list. :/

This looks promising if I can find a way to abstract it from their "Doc" implementation. (I don't want to maintain two copies of every document.) https://github.com/dominictarr/crdt

kifhan commented 7 years ago

Agree. Y-js is also CRDT. It's already implemented so may help you easy to code.

t-mullen commented 7 years ago

Y-js is a little heavy for what I need to do. It provides connectors (which I need to have control over) and databases (which I don't want at all, I need to use the native filesystem).

I'll definitely look at their implementation though.

t-mullen commented 7 years ago

Seems like LSEQ might be a better option. Every implementation of WOOT I've seen has serious issues with memory or has some complex garbage collection (because of the tombstones).

paper

t-mullen commented 7 years ago

I think I have an idea on how to implement this. Might hack together a prototype for the web version in the next few days.

Basically the idea is this:

Obviously there is a lot of work to be done, but this seems to be the best option.

*I don't like the idea of needing an intermediate model (because of the memory overhead), but it seems there is no way around it. These sequences will need to be stored as metadata somewhere in the extension and can be wiped on a hard sync, so we may need some kind of rolling consensus check to clear out documents that aren't being written to anymore.

I'm pretty new to this topic, so feel free to point out any optimizations that can be made.

t-mullen commented 7 years ago

As for the mentioned libs, it doesn't seem like there are any easy solutions to this. Y-JS is nice, but it assumes things that I have no control over. (ie, it needs direct access to the editor model, provides a custom transport, provides a database). Many of Multihack's features would be impossible with Y-JS. Other "turnkey" solutions have similar issues.

t-mullen commented 7 years ago

Another solution that was suggested to me is simply sending lock/ack messages for the first edit on a line, then an unlock message after a timeout or when a new line is edited. This approach only works because the document model is separated into lines and latency is low, but is way simpler and still is CCI.

I think I'm going to try this first. It might lead to a lot of reverts when many users are editing a line, but seriously, if more than a few people are all editing the same line simultaneously you're going to have a bad time just by the nature of realtime programming, regardless of the solution.

Going to run some tests with simulated network delay to see how critical this is. I haven't noticed any issues with conflicts before in normal testing.

kifhan commented 7 years ago

I'm currently attaching Y-js to Multihack for my personal project. Use Y-Map ( file path, value - Y-Text) to share file system, Y-Text to share contents. It is not using socket.io but only webrtc. I'll report you a result later.

t-mullen commented 7 years ago

Y-Text is fine for Codemirror and Brackets, but VSCode doesn't allow access to the Monaco editor and Atom has a custom editor. Storing the whole project in memory with Y-Map isn't really an option for larger projects. Interested in seeing what you accomplish though.

kifhan commented 7 years ago

I've finished attaching yjs on Multihack-web. https://github.com/kifhan/Rellat-otter Check index.js, editor.js and filesystem on ./mhweb/src/ I didn't touch interface.js and didn't use remote.js You may test it on http://ide.rellat.com:3000/ It take some time to load when initiating room. notice that init room with zip project file not working right now. I think I'm going to rewrite y-js part later. It's kind of tricky to work with it. If you found any bug or problem, feel free to talk.

t-mullen commented 7 years ago

Hmmm, this is really neat! I'm surprised how little code is required for Y-JS. Unfortunately, I still have the same reservations about using it. It abstracts too much away.

You've lost the ability to send additional data (highlighting, bring-user, remote carets, voice chat etc) because Y-JS has hidden the WebRTC connection. Also, you've lost ability to add a socket forwarding fallback for non-webrtc browsers. I can see this being fixed with a custom connector or some kind of metadata on the replicated model though.

Y-JS uses an in-memory data store, which is fine for multihack-web, but wouldn't work at all for either of the extensions. I see Y-JS allows custom database's but I think those need to store the actual CRDT model, and don't offer any sort of connection with a traditional filesystem. Maybe you could reconstruct the tree, but it would be very difficult. You would need this to save too (notice your saved zips are just the CRDT model, which is useless to an end user.)

You're also modifying the editor instance, which I can't do in VSCode at all. I only have access to Codemirror-like deltas.

If it works for your usecase, then great! But I think it will be a bit difficult to implement in the rest of multihack without sacrificing important features. I'll definitely be using this as a reference later.

t-mullen commented 7 years ago

Where are you binding the model to the tree view? I don't see anything for that anywhere?

kifhan commented 7 years ago

oh, It is on index.js dir_tree: 'Map' Y-Map type can hold another Type like Y-Map, Y-Text, Y-Array, etc. I use it like this. FileSystem.yfs = y.share.dir_tree;

t-mullen commented 7 years ago

This is what you mean by index.js?

Seems like it's just launching a headless browser and serving the page?

var express = require('express'),
    app = express(),
    path = require('path'),
    server = require('http').Server(app);
app.set('port', process.env.PORT || 3000);
app.use('/', express.static(path.join(__dirname, 'mhweb')));
app.use('/bower_components/', express.static(path.join(__dirname, 'bower_components')));
// Set up express server
server.listen(app.get('port'), function(){
    console.log("Express server listening on port " + app.get('port'));
});

var childProcess = require('child_process')
childProcess.execFile('google-chrome', [
    '--headless','--disable-gpu','--remote-debugging-port=9222'
    ], function(err, stdout, stderr) {
        // handle results 
});
childProcess.execFile('node', [
    '--inspect=9222',path.join(__dirname, 'primary-peer.js'),
    'http://localhost:3000'
    ], function(err, stdout, stderr) {
        // handle results 
});
kifhan commented 7 years ago

oh no, sorry. index.js on ./mhweb/src

t-mullen commented 7 years ago

Ohhhh ok, I was looking at the completely wrong code then :joy:

This is very promising. With a custom connector, I could see it working out well for multihack-web at least.

Am I correct in noticing you never call FileSystem.getFile(file_path).write( content ) because all of the data is stored in the Y-JS model? If you can figure out how to keep the File objects consistent with the model, we could probably use this approach for the extensions too. This might mean listening to update events on the Y-text?

There's still the issue of needing to bind to the Codemirror instance instead of just using the change deltas. Do you think it would be possible? It would mean updating the model on the Editor._onchange event.

t-mullen commented 7 years ago

It actually might be possible to use the deltas with some modification to Y-Text. They use them internally, we just need to expose them.

https://github.com/y-js/y-text/blob/master/src/Text.js#L167

kifhan commented 7 years ago

Yes. correct. And Yes, we could. I'm thinking about forking Yjs.

t-mullen commented 7 years ago

It actually looks very simple. All Y-Text is doing is capturing these events and calling replaceRange, exactly like we do now. I think this is fine.

I'm going to look into custom connectors, and if we can add any messages we want over the sync stream, this will actually work.

All that needs to be figured out is if we can access the file content when those add/delete events occur on the tree.

Thanks for doing the work on this @kifhan.

t-mullen commented 7 years ago

Also, would you like to be added as a maintainer? You clearly know what you're doing 👍

kifhan commented 7 years ago

Sure, gladly. I appreciated.

t-mullen commented 7 years ago

Great!

Quick question: Why do you need these timeouts here? https://github.com/kifhan/multihack-web/blob/5483528ca1a936ab778bfdd4c04d175466f7f8cd/src/index.js#L162

kifhan commented 7 years ago

yeah. because when set Y type like Y-Map, Y-Text are created aync. But they don't give me a Promise return. So I made timeout to work around.

t-mullen commented 7 years ago

Is event.value not the YText instance that you are trying to get?

kifhan commented 7 years ago

Yeah, that's tricky part. Actually event.value is just string describing type. Must use get(key) to get Actual Y-Text object. Wait I'm confusing. Let me check again. :dizzy_face: Sorry. Seems like that was unnecessarily. I'll remove timeout on mhweb/src/index.js timeout on editor.js is for waiting async of Y-Text.

t-mullen commented 7 years ago

Got y-text working with Codemirror deltas and figured out how to load an existing project in.

Only thing left to do is swap out the connector (for additional messages, voice and because they aren't throttling the connection properly.)

kifhan commented 7 years ago

I'm seeing that you connecting Y-Text to CodeMirror Doc object. Is it correct? I guess the reason why you using Doc is save selection highlighting and undos. That's good. Should we use gitter or slack or something?

t-mullen commented 7 years ago

Good idea: https://gitter.im/multihack/multihack

t-mullen commented 7 years ago

The work for the web version is done (with all the same features whaaaa).

Won't be able to release until the extensions are finished too, but you can fork the Y-JS branch if you don't care about that.

99% it's working, but if you see any bugs please report them here.