phosphorjs / phosphor

The PhosphorJS Library
BSD 3-Clause "New" or "Revised" License
1.04k stars 166 forks source link

Datastore example app #410

Closed ian-r-rose closed 4 years ago

ian-r-rose commented 4 years ago

Needs a bit of cleanup, but this is a functional app with collaborative text editing, based on an earlier version from @vidartf

jasongrout commented 4 years ago

I'm experimenting with this, using a file about 590K (17k short play lines) containing a few things from Shakespeare. I started progressively deleting chunks of this file. After a few deletions (probably spanning 5k lines or so each?), I get a server error like:

Mon Aug 05 2019 15:52:53 GMT-0700 (Pacific Daylight Time) Store ID 5 disconnected. Reason: 1009: Frame size of 32075084 bytes exceeds maximum accepted frame size

That's a frame size of 32MB. That seems a bit large considering my original file size was around 0.5MB. Is that expected?

ian-r-rose commented 4 years ago

Good question! I'm not really sure what the expected over head should be for the CRDT. But each character gets a unique ID, which appears to be a string of length 8. So that's almost an order of magnitude of overhead. Though your example has closer to two orders of magnitude for a UTF-8 encoded file. I do wonder if there is a bug in the websocket layer here -- sending patches should require less storage than the overall file.

We will definitely need to think about about some of these scalability issues:

  1. How do we checkpoint things?
  2. Should we not allow collaborative editing of large files?
  3. What are the limits of the transport layer?
jasongrout commented 4 years ago

I played with it a bit more. I applied the following patch:

diff --git a/examples/example-datastore/src/server.ts b/examples/example-datastore/src/server.ts
index 557a83c6..96d92ec1 100644
--- a/examples/example-datastore/src/server.ts
+++ b/examples/example-datastore/src/server.ts
@@ -162,7 +162,7 @@ wsServer.on('request', request => {
       return;
     }
     let data = JSON.parse(message.utf8Data!) as WSAdapterMessages.IMessage;
-    console.debug(`Received message of type: ${data.msgType}`);
+    console.debug(`Received message of type: ${data.msgType}; ${Buffer.byteLength(message.utf8Data!).toLocaleString()} bytes`);
     let reply: WSAdapterMessages.IReplyMessage;
     switch (data.msgType) {
       case 'storeid-request':
@@ -202,8 +202,10 @@ wsServer.on('request', request => {
       default:
         return;
     }
-    console.debug(`Sending reply: ${reply.msgType}`);
-    connection.sendUTF(JSON.stringify(reply));
+
+    let replyString = JSON.stringify(reply);
+    console.debug(`Sending reply: ${reply.msgType}; ${Buffer.byteLength(replyString).toLocaleString()} bytes`);
+    connection.sendUTF(replyString);
   });

   // Handle a close event from a collaborator.

Then I put a 30k file in my paste buffer (good ol' shakespeare :). I pasted it into the document a number of times, and you can see the sizes of the patches kept increasing quite a bit each time I pasted the same 30k string. In the middle, I added one character (that's the several hundred byte message), then kept pasting. Then I deleted about half the file, then deleted the entire rest of the file. I've annotated the log below with // comments.

// paste 30k
Received message of type: transaction-broadcast; 1,268,005 bytes
Broadcasting transactions to: 1
Sending reply: transaction-ack; 190 bytes

// paste 30k
Received message of type: transaction-broadcast; 2,109,170 bytes
Broadcasting transactions to: 1
Sending reply: transaction-ack; 190 bytes

// paste 30k
Received message of type: transaction-broadcast; 3,267,595 bytes
Broadcasting transactions to: 1
Sending reply: transaction-ack; 190 bytes

// paste 30k
Received message of type: transaction-broadcast; 4,376,264 bytes
Broadcasting transactions to: 1
Sending reply: transaction-ack; 190 bytes

// paste 30k
Received message of type: transaction-broadcast; 5,125,654 bytes
Broadcasting transactions to: 1
Sending reply: transaction-ack; 190 bytes

// paste 30k
Received message of type: transaction-broadcast; 6,509,601 bytes
Broadcasting transactions to: 1
Sending reply: transaction-ack; 190 bytes

// paste 30k
Received message of type: transaction-broadcast; 7,615,614 bytes
Broadcasting transactions to: 1
Sending reply: transaction-ack; 186 bytes

// paste 30k
Received message of type: transaction-broadcast; 8,284,228 bytes
Broadcasting transactions to: 1
Sending reply: transaction-ack; 186 bytes

// insert one character
Received message of type: transaction-broadcast; 355 bytes
Broadcasting transactions to: 1
Sending reply: transaction-ack; 186 bytes

// paste 30k
Received message of type: transaction-broadcast; 4,041,473 bytes
Broadcasting transactions to: 1
Sending reply: transaction-ack; 190 bytes

// paste 30k
Received message of type: transaction-broadcast; 5,407,557 bytes
Broadcasting transactions to: 1
Sending reply: transaction-ack; 186 bytes

// paste 30k
Received message of type: transaction-broadcast; 6,243,956 bytes
Broadcasting transactions to: 1
Sending reply: transaction-ack; 186 bytes

// paste 30k
Received message of type: transaction-broadcast; 7,560,826 bytes
Broadcasting transactions to: 1
Sending reply: transaction-ack; 190 bytes

// paste 30k
Received message of type: transaction-broadcast; 8,001,110 bytes
Broadcasting transactions to: 1
Sending reply: transaction-ack; 190 bytes

// paste 30k
Received message of type: transaction-broadcast; 8,843,058 bytes
Broadcasting transactions to: 1
Sending reply: transaction-ack; 190 bytes

// paste 30k
Received message of type: transaction-broadcast; 8,893,291 bytes
Broadcasting transactions to: 1
Sending reply: transaction-ack; 190 bytes

// paste 30k
Received message of type: transaction-broadcast; 8,960,872 bytes
Broadcasting transactions to: 1
Sending reply: transaction-ack; 190 bytes

// paste 30k
Received message of type: transaction-broadcast; 10,262,576 bytes
Broadcasting transactions to: 1
Sending reply: transaction-ack; 190 bytes

// add one character
Received message of type: transaction-broadcast; 644 bytes
Broadcasting transactions to: 1
Sending reply: transaction-ack; 190 bytes

// paste 30k
Received message of type: transaction-broadcast; 11,588,272 bytes
Broadcasting transactions to: 1
Sending reply: transaction-ack; 190 bytes

// Delete about half the file
Received message of type: transaction-broadcast; 71,278,174 bytes
Broadcasting transactions to: 1
Sending reply: transaction-ack; 190 bytes

// Delete the rest of the file
Received message of type: transaction-broadcast; 47,076,175 bytes

These message sizes are really concerning.

ian-r-rose commented 4 years ago

You're right, that does seem excessive (and inconsistent!). Looking into it...

sccolbert commented 4 years ago

If you're constantly pasting large text at the end of the file, I would expect the id overhead to continue to increase as you continue to create ids with larger dimensionality. Each dimension in an id has 48bits (which is a lot) but it's not densely populated. It's not sparsely populated either, so these patch sizes still look large to me.

ian-r-rose commented 4 years ago

@jasongrout When I perform a similar operation to you (pasting ~40k file repeatedly), I don't see nearly the increase in message size (though the overhead is still large, about 40x!)

Mon Aug 05 2019 17:19:22 GMT-0700 (Pacific Daylight Time) Connection accepted.
Received message of type: storeid-request; 89 bytes
Sending reply: storeid-reply; 148 bytes
Received message of type: history-request; 89 bytes
Sending reply: history-reply; 166 bytes
Received message of type: transaction-broadcast; 220 bytes
Sending reply: transaction-ack; 190 bytes
Received message of type: transaction-broadcast; 1,755,350 bytes
Sending reply: transaction-ack; 190 bytes
Received message of type: transaction-broadcast; 1,699,964 bytes
Sending reply: transaction-ack; 190 bytes
Received message of type: transaction-broadcast; 1,723,954 bytes
Sending reply: transaction-ack; 190 bytes
Received message of type: transaction-broadcast; 1,724,242 bytes
Sending reply: transaction-ack; 190 bytes
Received message of type: transaction-broadcast; 1,724,180 bytes
Sending reply: transaction-ack; 190 bytes
Received message of type: transaction-broadcast; 1,724,232 bytes
Sending reply: transaction-ack; 190 bytes
Received message of type: transaction-broadcast; 1,560,763 bytes
Sending reply: transaction-ack; 186 bytes
Received message of type: transaction-broadcast; 1,560,678 bytes
Sending reply: transaction-ack; 186 bytes
Received message of type: transaction-broadcast; 1,560,437 bytes
Sending reply: transaction-ack; 186 bytes
Received message of type: transaction-broadcast; 1,724,305 bytes
Sending reply: transaction-ack; 190 bytes
Received message of type: transaction-broadcast; 1,560,685 bytes
Sending reply: transaction-ack; 186 bytes
Received message of type: transaction-broadcast; 1,560,564 bytes
Sending reply: transaction-ack; 186 bytes
Received message of type: transaction-broadcast; 1,724,359 bytes
Sending reply: transaction-ack; 190 bytes
Received message of type: transaction-broadcast; 1,724,287 bytes
Sending reply: transaction-ack; 190 bytes
Received message of type: transaction-broadcast; 1,724,104 bytes
Sending reply: transaction-ack; 190 bytes
Received message of type: transaction-broadcast; 1,724,113 bytes
Sending reply: transaction-ack; 190 bytes
Received message of type: transaction-broadcast; 1,724,325 bytes
Sending reply: transaction-ack; 190 bytes
Received message of type: transaction-broadcast; 1,724,166 bytes
Sending reply: transaction-ack; 190 bytes
sccolbert commented 4 years ago

Outside of what may be causing this, there's certainly room for improvement wrt to id spans to handle large pastes, but we can address that later.

jasongrout commented 4 years ago

Also, I added a readme in this example directory:

# Phosphor Datastore example

## Build

Compile with `yarn run build:examples` in the Phosphor repo root directory.

## Run

Start the server with `node ./build/server.js`

Go to the address `http://localhost:8000` (or whatever port the server prints out that it is listening on).
jasongrout commented 4 years ago

If you're constantly pasting large text at the end of the file, I would expect the id overhead to continue to increase as you continue to create ids with larger dimensionality.

I was pasting text in random places inside the file.

Ian, can you try picking random places in the file to paste?

sccolbert commented 4 years ago

@ian-r-rose that's about more like what I would expect. You'll have at minimum 16bytes of overhead per character. (until we implement id spans)

ian-r-rose commented 4 years ago

Ooh, @jasongrout I can reproduce what you see by pasting in the middle of the file, as you suggested:

^V^[[AReceived message of type: transaction-broadcast; 1,279 bytes
Sending reply: transaction-ack; 190 bytes
Received message of type: transaction-broadcast; 1,279 bytes
Sending reply: transaction-ack; 190 bytes
Received message of type: transaction-broadcast; 1,755,620 bytes
Sending reply: transaction-ack; 190 bytes
Received message of type: transaction-broadcast; 3,094,110 bytes
Sending reply: transaction-ack; 190 bytes
Received message of type: transaction-broadcast; 4,519,486 bytes
Sending reply: transaction-ack; 186 bytes
Received message of type: transaction-broadcast; 5,741,282 bytes
Sending reply: transaction-ack; 186 bytes
Received message of type: transaction-broadcast; 5,571,838 bytes
Sending reply: transaction-ack; 186 bytes
Received message of type: transaction-broadcast; 7,674,794 bytes
Sending reply: transaction-ack; 190 bytes
Received message of type: transaction-broadcast; 7,686,337 bytes
Sending reply: transaction-ack; 186 bytes
Received message of type: transaction-broadcast; 9,219,505 bytes
Sending reply: transaction-ack; 186 bytes
Received message of type: transaction-broadcast; 9,268,659 bytes
Sending reply: transaction-ack; 190 bytes
Received message of type: transaction-broadcast; 9,234,554 bytes
Sending reply: transaction-ack; 190 bytes
sccolbert commented 4 years ago

I quick sanity check would be to log the average length of the ids generated for the file. Each character in the id string consumes 16bits (on chrome at least).

ian-r-rose commented 4 years ago

Yes, by repeatedly pasting long blocks of text internally, it's not hard to generate some very long average character lengths (~50-100 characters)

jasongrout commented 4 years ago

It sounds like we can do a lot to compress patch messages when you have ranges of text, which helps memory use in the browser as well as network bandwidth (I think at one point, the debugger stopped and said I was about to hit an out of memory error in applying a patch).

Newbie question: once the ids reach 50-100 bits, we have to deal with those large id sizes at least in that part of the file forever, right? No re-indexing?

sccolbert commented 4 years ago

It's not about compressing the patch messages, it's about compressing the ids into ranges. It's not exactly straightforward to implement, which is why I haven't done it yet. There's a deterministic algorithm to apply to ensure that the ranges can be split simultaneously by multiple users and still be merged out of order.

sccolbert commented 4 years ago

And you mean 50-100 characters, not bits, right? A single id is at minimum 16 bytes (128bits): https://github.com/phosphorjs/phosphor/blob/master/packages/datastore/src/utilities.ts#L54

sccolbert commented 4 years ago

@jasongrout and I'm curious, have you run the same test on SMC?

vidartf commented 4 years ago

I think the issue exposed here turned out to not really being related to the code of this PR. If so, let's leave this thread to discussing the code in the PR, and continue the load testing discussion here: https://github.com/phosphorjs/phosphor/issues/411

jasongrout commented 4 years ago

@jasongrout and I'm curious, have you run the same test on SMC?

Similar, but not exactly. It wasn't an issue, IIRC. I'll run a similar test and report back.

On Google Docs, again, similar but not exactly, but IIRC, the patches were around 700k no matter what for pastes.

ian-r-rose commented 4 years ago

Superseded by #425