xaviergonz / mobx-keystone

A MobX powered state management solution based on data trees with first class support for Typescript, support for snapshots, patches and much more
https://mobx-keystone.js.org
MIT License
554 stars 25 forks source link

yjs Map not recommended for key-value data. #535

Open collin opened 11 months ago

collin commented 11 months ago

I was looking at the recent release and saw there are yjs bindings. I've just spent some time building out yjs bindings for a mobx-keystone backed application.

One thing I encountered was using Y.Map for simple props can lead to very inefficient data storage. This is mentioned somewhat out-of-the-way in the yjs/y-utility repository.

The specific case I had was a model storing positional data: ~like

@model("Point")
class Point extends Model({
  x: tProp(types.number, 0),
  y: tProp(types.number, 0)
}) {
  @modelAction
  moveBy(x: number, y: number) {
    this.x += x;
    this.y += y;
  }
}

Due to the lack of data pruning in Y.Map while doing interleaved writes of plain values, a relatively short editing session on the order of minutes would lead to storing hundreds ok kb of data.

When I switched to using the YKeyValue utility, the stored size of my documents stays very small. I would be quite happy to use the new official bindings, but would need to find a way to use YKeyValue for plain values. Is that something you think could be ~easily supported by the official bindings?

xaviergonz commented 11 months ago

Thanks, I didn't know that existed

Looking into it it seems it only supports primitives as values like you said, which would be quite limiting 😔 https://github.com/yjs/y-utility/blob/98204141aead9bde38c6e9b569c2a3d597beaf58/y-keyvalue.js#L11

What if each of your points (or even the whole array of points) were frozen data (https://mobx-keystone.js.org/frozen-data) and the bindings would treat those frozen objects as plain values instead of YMaps/YArrays? would that help?

So for example, an snapshot like

{
  $modelType: 'polygon',
  points: {
    $frozen: true,
    data: [{x:1, y:1}, ...]
  }
}

would bind into

YMap {
  $modelType: 'polygon',
  points: plain object (instead of YMap) {
    $frozen: true,
    data: [{x:1, y:1}, ...] (instead of YArray of YMaps)
  }
}
collin commented 11 months ago

I'm not entirely sure what the implications of that would be.

What I've done is store things based on whether or not it is a simple value.

{
  $modelType: 'object',
  x: 10,
  y: 10,
  width: 100,
  height: 100,
  background: 'red',
  color: 'blue',
  children: [{
    $modelType: 'object',
    /* and so on... */
  }]
  /* and so on... */
}

In my yjs adapter I split out the data depending on whether or not it is a simple type, something like:

Y.Map {
  $modelType: 'object',
  data: Y.Array (controlled by YKeyValue) [
    { key: 'x', val: 10 },
    { key: 'y', val: 10 },
    { key: 'width', val: 100 },
    { key: 'y', val: 100 },
    /* and so on, this structure is maintained by YKeyValue, so I never directly interact with it as an array */
  ]
  children: Y.Array [
    Y.Map {
      $modelType: 'object',
      data: Y.Array (controlled by YKeyValue) [
        { key: 'x', val: 10 },
        { key: 'y', val: 10 },
        /* and so on... */
      },
      children: Y.Array []
    ]
  }]
}

Sort of like that.

Each model is stored in a Y.Map. The model's Y.Map has one Y.KeyValue that stores "primitive" props. Complex props like Models, Maps, Sets, Arrays, and Refs get attached as nested Y.Map/Y.Array instances as neccessary.

( In practice I actually have one root Document model that has a flat map of all the objects in the document. Then all the relationships between the models are maintained as refs that resolve to the root map. That way when I move models around I don't actually have to move large parts of the YJS document around, just lightweight refs. )

xaviergonz commented 11 months ago

Hi, I've been looking into it and doing some tests:

YKeyValue (randomly setting keys)
writing 100k updates on 1000 keys 12410ms
   Size of the encoded document V2: 26773
   Size of the json document: 27891

YMap (randomly setting keys)
  writing 100k updates on 1000 keys 269ms
   Size of the encoded document V2: 213587
   Size of the json document: 13891

YKeyValue map seems to save quite a bit of storage like you pointed out (at least for plain values, since that's all it supports right now), but at the same time it seems to be 46 times slower :-/

I've been looking into other CRDT libs and they seem to have the same issue though. For example, automerge uses around the same space as V2 (around 200KB) while LORO uses the same space as V1 (around 490KB).

I'll try to examine the source code to figure out the trick that makes it save space as well as what makes it slow (Perhaps there's a way to make it faster and, who knows, to support nested Y.js structures?). I have the feeling this is something that might be really useful if it were done in the Y.js side though, since saving space in plain objects seems like a very basic use-case.

In the meanwhile I released a v1.3 that saves frozen values as plain values in Y.js instead of transforming them deeply into Y.Map and Y.Array (I don't know if that might help in your particular case, but just FYI).

collin commented 11 months ago

Yeah, that's a pretty 👀 speed difference.

Running a similar shootout, it seems like the number of keys makes a big difference.

I ran with 10_000 random updates:

1000 Keys 500 Keys 250 Keys 125 Keys 75 Keys 40 Keys 20 Keys 10 Keys 5 Keys 1 Key
Y.Map 37ms 43ms 26ms 24ms 23ms 24ms 28ms 22ms 22ms 24ms
YKeyValue 575ms 306ms 171ms 105ms 95ms 71ms 65ms 55ms 64ms 50ms

Not very rigorous benchmarking, just one run, but the trend seems credible. YKeyValue is dominated by the number of keys stored, and Y.Map is just depends on the number of updates.

looking at the implementation, this makes a lot of sense. It looks like YKeyValue is converting the internal data into a javascript Array for every change. Looking closer at the implementation of toArray on Y.Array, it is indeed looping through the array every time.

Benchmark code ```ts import * as Y from "yjs"; import { YKeyValue } from "y-utility/y-keyvalue"; const count = 10_000; const keys = 1 const mapDoc = new Y.Doc(); const map = mapDoc.getMap("map"); let now = performance.now(); for (let i = 0; i < count; i++) { const key = (Math.random() * keys).toFixed(0); map.set(key, i); } console.log("Time Map: " + (performance.now() - now)); const kvDoc = new Y.Doc(); const array = kvDoc.getArray<{ key: string; val: unknown; }>("array"); const kv = new YKeyValue(array); now = performance.now(); for (let i = 0; i < count; i++) { const key = (Math.random() * keys).toFixed(0); kv.set(key, i); } console.log("Time KV: " + (performance.now() - now)); ```

For my case, I'm unlikely to do more than tens of updates per frame at 60FPS, and I'm in the range of < 100 keys. At that scale updates always sub-millisecond no matter which data type, so document size matters much more to me.