Why not use MongoDB itself as an authority?

mitar commented 7 years ago

From quick reading of the code it seems like you are using Rocket.Chat streamer and make your own way to sync up multiple authorities (when horizontally scaling Meteor) together? Why this design choice? I always thought that it should be easy to integrate it using MongoDB. You just have a collection of documents representing steps, where you have unique index on (document ID, version), and then you insert steps into the collection from Meteor method which you call from the client side with new steps. If adding a step succeeds, it gets published to a client, which adds it to the editor. Am I missing something?

ejfrancis commented 6 years ago

Hey @mitar

Sorry for the very late response, as you can probably tell my work on this project has stopped as other things have consumed my time and I no longer need this for my application.

The main reason for choosing to have authorities keep all steps in memory instead of in the database is because every single keystroke or text modification is recorded as a step in the ProseMirror internal data model. Imagine a subscription to a collection that has 1,000 concurrent users each submitting on average 3 keystrokes a second. That would be 1,000x3x60=180,000 writes per minute that the Meteor server has to process as it tails the Mongo oplog. Now imagine your app gets popular on reddit or hackernews and sees a spike in traffic to 3,000 concurrent users. This would bring any server to a halt at scale. For that reason, I chose to keep an in-memory session of steps and have each user maintain a second DDP connection to whatever server has their session. Mongo is never hit while submitting or receiving steps so this is very fast. Then on a set interval (snapshotIntervalMs) that defaults to 5 seconds right now, the whole document is stored as a snapshot in Mongo which can be restored. I believe this is a good compromise for still keeping the document in Mongo, but allowing a more scalable approach to realtime edits.

Another option was to use Redis as a store for steps, but I wanted this to be a drop-in package that can be used without any extra dependencies. A Redis adapter for it would be an interesting idea though if you wanted that option.

mitar commented 6 years ago

Eh, no need to be sorry. I know how it is, when there is simply not enough time.

I do get what you are saying, but isn't then collaboration logic much harder? ProseMirror requires a serial index for steps to be enforced by the server, and so MongoDB provides that through its unique index (this is how I do it in my implementation). If you do not have that, then you need to get all clients to connect to the same node and have everything in memory, which is what you are doing through the second DDP connection, no? But then you assume that one node will be able to handle all clients for a particular document (session)? And you think about Redis to maybe address that?

Anyway, I implemented this through MongoDB in my app. My approach to scaling there is that I anyway do not think that having 1000 users on one document really works. So I am mixing real-time collaboration on smaller groups with GitHub-like forking and merging for larger groups.

But yes, probably as things will scale I will have to think about optimizations here.

prosemeteor / prosemirror

Why not use MongoDB itself as an authority? #40