yjs / y-webrtc

WebRTC Connector for Yjs
MIT License
448 stars 109 forks source link

Implment Packet buffer for sending #20

Open disarticulate opened 3 years ago

disarticulate commented 3 years ago

Most browsers currently have a limit for message size:

https://stackoverflow.com/questions/15435121/what-is-the-maximum-size-of-webrtc-data-channel-messages

My testing on chrome gets the following error: Attempting to send message of size 988606 which is larger than limit 262144

Although the spec is expected to be built into browsers, these arbitrary size limits result in no error message that I see in the console. The above comes from running debug version of chrome.

When I tried to create my own 'sync' system before switching to try y-webrtc, I used protocol buffers to wrap updates, and hashed the data to keep them order/organized. I don't have any real knowledge about best practices, however.

dmonad commented 3 years ago

Hi @disarticulate ,

this is indeed a problem. Data channels in WebRTC really feel like an afterthought in many places.

One solution would be to write a wrapper around the webrtc package "simple-peer" that will handle splitting up messages. Larger messages simply need to be split up if they exceed a certain size (I wonder if it is possible to get/overwrite the message-size limit)..

I wanted to write a wrapper around simple-peer anyway because different browsers often have trouble communicating with each other. Sometimes messages get lost although we use a reliable webrtc connection. Our wrapper around simple-peer should handle splitting up messages and making sure that no messages get lost (using a retry logic).

I imagine that we simply assign an increasing number to each message. Messages that are split have an additional increasing number that defines the part of the message.

This is how I would define the protocol. Internally, I'd probably simply encode this to Uint8Arrays using lib0/encoding. Protocol buffers is great, but it adds quite some overhead that I try to avoid (bundle size & mental complexity).

# Example of a "normal" message that is not split up
[normalMessageType, messageClock, ...message]

# Example of a split message
[splitMessageType, messageClock, numberOfMessageParts, partNumber, ...messagePart]

The peers would need to maintain a list of messages that they have not received yet. And of course, they would need to merge message parts when all parts have been received. For Yjs it is not necessary to apply messages in a certain order. Any order is fine. Messages just should not get lost.

When I tried to create my own 'sync' system before switching to try y-webrtc, I used protocol buffers to wrap updates, and hashed the data to keep them order/organized. I don't have any real knowledge about best practices, however.

One advantage of using Yjs/CRDTs is that you don't have to care about the order of messages. These messages simply have to arrive somehow at the other peers.

disarticulate commented 3 years ago

I looked around for some prior art, and this appears to be the only wrapper around simple peer that overcomes the issue:

https://github.com/disarticulate/simple-peer-files

The simple-peer-files/src/Meta.ts implements a similar protocol to what you describe

I forked it to see how small it could be bundled, including making simple-peer a peerDependency, without @feross/buffer, it came out to ~38Kb, compressed I believe, ~110Kb uncompressed. It looks like they're using some heavy streaming libraries, so i'm not sure how to interpret 'bundle size', but I'd guess a lot of that duplicates what you've done with lib0.

the other thought I had: with the Yjs/CRDT is there anyway to 'naturally' spit out smaller/chunked updates with some kind of flag? This would probably ruin the advantage of out of order updates to the extent that you'd need to mutexlock updates until a splitMessage is finished sending.

For now, I'm down sizing my documents and moving the media/large segmented parts into hashes and seeing if simple-peer-files works well enough to do the heavy lifting and recombine the thing on the otherside.

dmonad commented 3 years ago

i'm not sure how to interpret 'bundle size', but I'd guess a lot of that duplicates what you've done with lib0.

Yjs uses lib0/encoding anyway. So I'd like to avoid other encoding-libraries if possible. Seems a lot of people are focused on protobuf ^^ https://github.com/yjs/yjs/issues/262 - I Explained my reasons for not using protobuf in Yjs there.

It seems that WebRTC doesn't always guarantee in-order delivery. So the new protocol should account for that. Simply describing the end of a message only works when the protocol guarantees in-order delivery.

the other thought I had: with the Yjs/CRDT is there anyway to 'naturally' spit out smaller/chunked updates with some kind of flag? This would probably ruin the advantage of out of order updates to the extent that you'd need to mutexlock updates until a splitMessage is finished sending.

There is. You can basically split up Yjs documents into smaller update messages. But, when you insert one huge JSON/binary blob in Yjs, then the smallest update-unit might still be too large for WebRTC. I don't think we can get around splitting of messages..

disarticulate commented 3 years ago

my webrtc buffer protocol was to:

  1. hash the data
  2. chunk the data, then calculate the hashes
  3. wrap in a protobuf with packet # and metadata, particularly the final hash;
  4. receive packets in whatever order then reassemble until the hash matches. so no 'technical' order was necessary but nnot knowing the numbers would make reassembly expensive, but not impossible.

hashing was used because i semi-expect to have an unsecure network and wanted my packets not to be modified, but right now it's just syncing device documents.

I think the problem is definitely webrtc, but I could imagine a benefit to standard 'update sizes' via an intelligent chunking function within the core, as abstractly it seems that's what you're doing when you're moving updates left or right. a buffer's just a bunch of updates to the right. It's just it loses the advantage while it's trying to do that update.

anyway, I'm deep into my application layer and cannot provide much other than presenting things I've found along the way.

holtwick commented 3 years ago

Hi, I would like to join the discussion with a question: If I'd like to send a bigger file like an image, I guess it doesn't make sense to wrap that in a Y.Doc?

If that's true, what would be the best way to share such a file among peers? Usually I would send some request to a peer to send me the binary data using a DataChannel Is that correct?

Can we extend y-webrtc to support exchanging additional data formats? Would it be possible to use the same encryption?

dmonad commented 3 years ago

In the current state, y-webrtc apparently can't handle large files (depending on the browser being used).

Managing this manually would be pretty hard because you need to coordinate where to get the file from. y-webrtc supports partially connected networks (not every client is connected to every other client).

Therefore, it might make sense to put the image in a subdocument. Then Yjs can handle syncing the image asynchronously. There should be close to no performance overhead if you store the image as a Uint8Array somewhere in Yjs.

dmonad commented 3 years ago

Another nice alternative is to use webtorrent (for large files).

disarticulate commented 3 years ago

If you use any array, you would need to chunk the transaction to ~16KB to maximize transmission, according to some testing I've seen. Past 64KB, certain browsers silently fail to send.

On Wed, Jan 6, 2021, 06:32 Kevin Jahns notifications@github.com wrote:

In the current state, y-webrtc apparently can't handle large files (depending on the browser being used).

Managing this manually would be pretty hard because you need to coordinate where to get the file from. y-webrtc supports partially connected networks (not every client is connected to every other client).

Therefore, it might make sense to put the image in a subdocument. Then Yjs can handle syncing the image asynchronously. There should be close to no performance overhead if you store the image as a Uint8Array somewhere in Yjs.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/yjs/y-webrtc/issues/20#issuecomment-755273351, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEFHWPNKHZHAZXCP5M5ZXWDSYRJ4VANCNFSM4UEDJSXA .

holtwick commented 3 years ago

Thanks, @dmonad and @disarticulate for the valuable feedback. I will test the solutions you mentioned once I get to the implementation of that feature in my project. I'll give feedback on the outcomes.

To summarize the solutions you proposed:

I would add another solution, for my special use case, involving a stupid web server to upload the data once and clients fetching from there.

disarticulate commented 3 years ago

I created a monkey patch, hack, into SimplePeer here:

https://github.com/disarticulate/y-webrtc/

I did the following:

  1. Extended SimplePeer's class as SimplePeerExtended.js
  2. Overwrite import in y-webrtc.js to use th eextneded version
  3. created two Y.Doc for transmission (txDoc) and receiving (rxDoc)
  4. created a initial setup and sync transmissions for peers a. client1 syncs: txDoc -> rxDoc (one way) b. client2 syncs: txDoc -> rxDoc (one way)
  5. send(chunk) -> queses data, creates more chunks with packets, and sends each packet into an array in the txDoc
  6. txDoc.on('update' -> sends msg to sync
  7. rxDoc is updated with msg
  8. upon receipt of all packets, this.push is triggered

it reuses yjs and no outside packages. it may be a design guide to something more economical. also, i believe WebRTC spec doesn't garuntee order of transmission so the CRDT algo does some work here. otherwise we're just using the nice encoded dataset given byh 'update'

martinpengellyphillips commented 1 year ago

I just encountered this and took a while to determine the issue. What happened in my case is that syncing in Firefox worked, but syncing the same in Chrome suddenly started failing (having worked previously). I eventually narrowed it down to a size issue where a particularly large update was silently breaking y-webrtc for Chrome.

A few questions:

Thanks!

disarticulate commented 11 months ago

@martinpengellyphillips here's the https://github.com/yjs/y-webrtc/pull/25 pull request. I think some of the feedback is about better integration with @dmonad's approach and comments.

As far as I know, this is just how webrtc is going to handle things. Another solution would be to figure out how to ensure all updates using webrtc are already a max size before using the pipe.

andre-dietrich commented 9 months ago

I created a monkey patch, hack, into SimplePeer here:

https://github.com/disarticulate/y-webrtc/

I did the following:

1. Extended SimplePeer's class as SimplePeerExtended.js

2. Overwrite import in y-webrtc.js to use th eextneded version

3. created two Y.Doc for transmission (txDoc) and receiving (rxDoc)

4. created a initial setup and sync transmissions for peers
   a. client1 syncs: txDoc -> rxDoc (one way)
   b. client2 syncs: txDoc -> rxDoc (one way)

5. send(chunk) -> queses data, creates more chunks with packets, and sends each packet into an array in the txDoc

6. txDoc.on('update' -> sends msg to sync

7. rxDoc is updated with msg

8. upon receipt of all packets, this.push is triggered

it reuses yjs and no outside packages. it may be a design guide to something more economical. also, i believe WebRTC spec doesn't garuntee order of transmission so the CRDT algo does some work here. otherwise we're just using the nice encoded dataset given byh 'update'

@disarticulate ... Thanks for your efforts, I used your fix as an alternative WebRTC-Provider and it works like charm, tested it on different browsers and with images and even video files ...