Open raineorshine opened 1 year ago
Maybe this is related to https://github.com/yjs/y-indexeddb/issues/25
Yes, it is the same cause. One difference though is that I never instantiate more than one IndexeddbPersistence per Doc. I just have a lot of Docs.
Our one need is to support connect and disconnect many times. For example we will have many ydoc in storage but only enable the foreground ydoc.
Exactly.
Perhaps if I better understand the purpose of beforeApplyUpdatesCallback
I can craft a more fine-tuned solution. Infinite growth of hundreds of unmodified Doc is less than ideal.
The other approach is to compact the db much more aggressively, though my hope is that this won't be necessary.
@Himself65 Can you share the source for https://www.npmjs.com/package/@toeverything/y-indexeddb? I don't see it on your GitHub.
@Himself65 Can you share the source for https://www.npmjs.com/package/@toeverything/y-indexeddb? I don't see it on your GitHub.
https://github.com/toeverything/AFFiNE/tree/master/packages/y-indexeddb
It's open source, I might should update package.json
later
Just wanted to leave a note here that this conversation continued on the discuss board:
https://discuss.yjs.dev/t/refreshing-page-causes-y-indexeddb-to-accumulate-db-entries/984/17
@raineorshine your solution looked good:
I have resorted to a more brute force approach of triggering a database compaction on the initial sync. You can do this by setting
PREFERRED_TRIM_SIZE = 0
, or calling a debouncedstoreUpdate(this, true)
after the initial sync. Here is an example of the latter approach, which I am currently using: https://github.com/raineorshine/y-indexeddb/commit/178718bf1d0b12807ff8d9f56d139e1b53d6a608
I wonder if you are still using that approach?
Yes, the initial compaction on load has been working well for me. Performance seems fine anecdotally (with hundreds of Docs) though I haven't done any measurements.
I've noticed that the IndexedDB size keeps growing after each reload. Although this increase does not immediately affect performance, as @raineorshine mentioned, it could become a major issue over time.
@raineorshine, what's your solution to prevent this issue, and how do you handle it in production? @dmonad, what do you suggest we do? Should we continue using this version, or can we expect an update soon?
@HMarzban My solution is here: https://github.com/raineorshine/y-indexeddb/commit/178718bf1d0b12807ff8d9f56d139e1b53d6a608.
I've been using it for a while now and it works well. I haven't measured the performance impact, but one IndexedDB write on load doesn't seem very significant.
@HMarzban My solution is here: raineorshine@178718b.
I've been using it for a while now and it works well. I haven't measured the performance impact, but one IndexedDB write on load doesn't seem very significant.
Oh, great job, sounds straightforward. did you submit a PR for your solution?
No, because it comes with a performance cost and may not be the right solution for everybody. It could be opt-in though.
Hey all @raineorshine @HMarzban @jeffrafter @himself65 I forked this and implemented what I think is the obvious bug here:
https://github.com/yjs/y-indexeddb/compare/master...disarticulate:y-indexeddb:patch-1
In short, the constructor function calls to initiate then doc sync:
this._db.then(db => {
[...]
fetchUpdates(this, beforeApplyUpdatesCallback, afterApplyUpdatesCallback)
}
This is an async call; before this is completed a listener is already attached via:
doc.on('update', this._storeUpdate)
So the update itself gets stored when it's already listening for updates, hence the duplication. I've only implemented the minimum fix here, but a more robust implementation would add a pre-sync function listener doc.on('update', this._queueUpdate)
for the time between constructor and whenSynced; apply any updates received, then replace doc.on('update', this._storeUpdate
.
However, it seems like if you simply ensure the client always calls whenSynced before changes are made, it should work. I'll be testing it shortly.
Looks like I'm still getting a update every time I refresh, but I need to create a minimal reproduction without other events occurring.
A minimal implement still gets a update. The next suspect is fetchUpdates
, here:
beforeApplyUpdatesCallback(updatesStore)
Y.transact(idbPersistence.doc, () => {
updates.forEach(val => Y.applyUpdate(idbPersistence.doc, val))
}, idbPersistence, false)
afterApplyUpdatesCallback(updatesStore)
The problem with afterApplyUpdatesCallback
is it likely occurs before the updates are completed, because the transaction will be scheduled later. The solution i've implemented elsewhere in client code is to create a origin system as Y.applyUpdate takes an optional origina object. However, then you're checking for the init origin every time you get an update, althought it might suffice to simply swap function listeners once you pass the whenSynced stage.
It might actually suffice to just put afterApplyUpdatesCallback(updatesStore)
inside the transaction, but I'll need to test that later.
Checklist
Describe the bug
The size of the object store grows every time the page is refreshed. This occurs because the doc is encoded as an update and appended to the object store every time
IndexeddbPersistence
is instantiated.The db is compacted after reaching the
PREFERRED_TRIM_SIZE
whenever an update is stored. However, this assumes 1) a small number of Docs, and 2) Docs will be modified. While reasonable on the surface, neither of these assumptions hold in all cases. I currently instantiate thousands of Docs for an offline-first graph database built on YJS. Many hundreds are instantiated and destroyed dynamically as the user navigates the graph, and some may never be modified.To Reproduce
Minimal example:
https://codesandbox.io/p/sandbox/pedantic-euler-jje92f?file=%2Fsrc%2FApp.tsx%3A1%2C1-2%2C1
Notice that no modifications are made to the Doc, yet it grows in size.
Expected behavior
The persisted Doc should not grow indefinitely when no changes are made.
Environment Information
Additional context
I can see that the behavior was introduced in this commit.
~One possible solution is adding the condition
if (updates.length === 0)
tobeforeApplyUpdatesCallback
, as it seems to only be needed on a new Doc. That stops the infinite growth, and the tests pass, however I don't know the full implications of this change. IfbeforeApplyUpdatesCallback
needs to be invoked in other cases that are not covered by the tests that would be important to know.~Is there a condition that detects when
beforeApplyUpdatesCallback
does not need to be called? I have tried skipping it when the Doc is empty, but that does not work.