streamich / json-joy

JSON CRDT, JSON CRDT Patch, JSON Patch+, JSON Predicate, JSON Pointer, JSON Expression, JSON Type
https://jsonjoy.com/libs/json-joy-js
Apache License 2.0
752 stars 14 forks source link

53-bit SessionID collision risk #675

Closed tsukkiren closed 2 months ago

tsukkiren commented 2 months ago

Hi, this library looks fantastic, but one thing holding me back is the SessionID size.

Why not use a full 128-bit UUID? Unless there is a central server coordinating everything, it seems possible for two forks of a document to end up with two identical SessionIDs that then cannot be later merged.

streamich commented 2 months ago

Hi @tsukkiren, the reason is because storing and working with 128-bit UUID is inefficient. There are storage costs, but also runtime costs: all algorithms need to constantly do sid comparisons and having a sid in a number is the most efficient way.

Also, runtime memory costs, all document nodes and RGA chunks have an ID assigned to them, when a document with thousands of nodes is loaded, it is thousands of nodes times the number of bytes the sid consumes. If the sid is represented by UUID I reckon it woud use at least 10x more memory (be they stored in Uint8Array or [number, number, number, number] 4-tuple).

The likelihood that two sids can collide is 2 ** 53 = 9007199254740992, even if they collide it does not necessarily mean that the document gets into a corrupted state. I have not tested it, but in theory it should get corrupted only if two peers generate the same sid AND edit the same part of the document (conflicting change) AND do it concurrently AND the changes are differnet.

If there is a central server, non of that applies; the server just enforces the Patch order and contents. The library even has the ability to run the JSON CRDT in the "server-clock" mode, where sid is set to 1 for all peers.