yjs / y-indexeddb

IndexedDB database adapter for Yjs
https://docs.yjs.dev/ecosystem/database-provider/y-indexeddb
Other
192 stars 30 forks source link

Doc grows without any changes #31

Open raineorshine opened 1 year ago

raineorshine commented 1 year ago

Checklist

Describe the bug

The size of the object store grows every time the page is refreshed. This occurs because the doc is encoded as an update and appended to the object store every time IndexeddbPersistence is instantiated.

The db is compacted after reaching the PREFERRED_TRIM_SIZE whenever an update is stored. However, this assumes 1) a small number of Docs, and 2) Docs will be modified. While reasonable on the surface, neither of these assumptions hold in all cases. I currently instantiate thousands of Docs for an offline-first graph database built on YJS. Many hundreds are instantiated and destroyed dynamically as the user navigates the graph, and some may never be modified.

To Reproduce

Minimal example:

https://codesandbox.io/p/sandbox/pedantic-euler-jje92f?file=%2Fsrc%2FApp.tsx%3A1%2C1-2%2C1

Notice that no modifications are made to the Doc, yet it grows in size.

Expected behavior

The persisted Doc should not grow indefinitely when no changes are made.

Environment Information

Additional context

I can see that the behavior was introduced in this commit.

~One possible solution is adding the condition if (updates.length === 0) to beforeApplyUpdatesCallback, as it seems to only be needed on a new Doc. That stops the infinite growth, and the tests pass, however I don't know the full implications of this change. If beforeApplyUpdatesCallback needs to be invoked in other cases that are not covered by the tests that would be important to know.~

Is there a condition that detects when beforeApplyUpdatesCallback does not need to be called? I have tried skipping it when the Doc is empty, but that does not work.

himself65 commented 1 year ago

Maybe this is related to https://github.com/yjs/y-indexeddb/issues/25

raineorshine commented 1 year ago

Yes, it is the same cause. One difference though is that I never instantiate more than one IndexeddbPersistence per Doc. I just have a lot of Docs.

Our one need is to support connect and disconnect many times. For example we will have many ydoc in storage but only enable the foreground ydoc.

Exactly.

Perhaps if I better understand the purpose of beforeApplyUpdatesCallback I can craft a more fine-tuned solution. Infinite growth of hundreds of unmodified Doc is less than ideal.

The other approach is to compact the db much more aggressively, though my hope is that this won't be necessary.

@Himself65 Can you share the source for https://www.npmjs.com/package/@toeverything/y-indexeddb? I don't see it on your GitHub.

himself65 commented 1 year ago

@Himself65 Can you share the source for https://www.npmjs.com/package/@toeverything/y-indexeddb? I don't see it on your GitHub.

https://github.com/toeverything/AFFiNE/tree/master/packages/y-indexeddb

It's open source, I might should update package.json later

jeffrafter commented 8 months ago

Just wanted to leave a note here that this conversation continued on the discuss board:

https://discuss.yjs.dev/t/refreshing-page-causes-y-indexeddb-to-accumulate-db-entries/984/17

@raineorshine your solution looked good:

I have resorted to a more brute force approach of triggering a database compaction on the initial sync. You can do this by setting PREFERRED_TRIM_SIZE = 0, or calling a debounced storeUpdate(this, true) after the initial sync. Here is an example of the latter approach, which I am currently using: https://github.com/raineorshine/y-indexeddb/commit/178718bf1d0b12807ff8d9f56d139e1b53d6a608

I wonder if you are still using that approach?

raineorshine commented 8 months ago

Yes, the initial compaction on load has been working well for me. Performance seems fine anecdotally (with hundreds of Docs) though I haven't done any measurements.

HMarzban commented 6 months ago

I've noticed that the IndexedDB size keeps growing after each reload. Although this increase does not immediately affect performance, as @raineorshine mentioned, it could become a major issue over time.

@raineorshine, what's your solution to prevent this issue, and how do you handle it in production? @dmonad, what do you suggest we do? Should we continue using this version, or can we expect an update soon?

raineorshine commented 6 months ago

@HMarzban My solution is here: https://github.com/raineorshine/y-indexeddb/commit/178718bf1d0b12807ff8d9f56d139e1b53d6a608.

I've been using it for a while now and it works well. I haven't measured the performance impact, but one IndexedDB write on load doesn't seem very significant.

HMarzban commented 6 months ago

@HMarzban My solution is here: raineorshine@178718b.

I've been using it for a while now and it works well. I haven't measured the performance impact, but one IndexedDB write on load doesn't seem very significant.

Oh, great job, sounds straightforward. did you submit a PR for your solution?

raineorshine commented 6 months ago

No, because it comes with a performance cost and may not be the right solution for everybody. It could be opt-in though.

disarticulate commented 1 month ago

Hey all @raineorshine @HMarzban @jeffrafter @himself65 I forked this and implemented what I think is the obvious bug here:

https://github.com/yjs/y-indexeddb/compare/master...disarticulate:y-indexeddb:patch-1

In short, the constructor function calls to initiate then doc sync:

this._db.then(db => {
  [...]
  fetchUpdates(this, beforeApplyUpdatesCallback, afterApplyUpdatesCallback)
}

This is an async call; before this is completed a listener is already attached via: doc.on('update', this._storeUpdate)

So the update itself gets stored when it's already listening for updates, hence the duplication. I've only implemented the minimum fix here, but a more robust implementation would add a pre-sync function listener doc.on('update', this._queueUpdate) for the time between constructor and whenSynced; apply any updates received, then replace doc.on('update', this._storeUpdate.

However, it seems like if you simply ensure the client always calls whenSynced before changes are made, it should work. I'll be testing it shortly.

disarticulate commented 1 month ago

Looks like I'm still getting a update every time I refresh, but I need to create a minimal reproduction without other events occurring.

A minimal implement still gets a update. The next suspect is fetchUpdates, here:

      beforeApplyUpdatesCallback(updatesStore)
      Y.transact(idbPersistence.doc, () => {
        updates.forEach(val => Y.applyUpdate(idbPersistence.doc, val))
      }, idbPersistence, false)
      afterApplyUpdatesCallback(updatesStore)

The problem with afterApplyUpdatesCallback is it likely occurs before the updates are completed, because the transaction will be scheduled later. The solution i've implemented elsewhere in client code is to create a origin system as Y.applyUpdate takes an optional origina object. However, then you're checking for the init origin every time you get an update, althought it might suffice to simply swap function listeners once you pass the whenSynced stage.

It might actually suffice to just put afterApplyUpdatesCallback(updatesStore) inside the transaction, but I'll need to test that later.