Closed HengCC closed 7 months ago
I store and read like this:
read:
if (result.content && result.content !== "") {
return Promise.resolve(Buffer.from(result.content, 'binary'));
} else {
return Promise.resolve(null);
}
store:
content= state.toString("binary")
hey @HengCC, the data is stored in a binary yjs format, which is highly efficient and really fast. Yjs has to track history of all changes that any user has done, which is why the document naturally gets bigger over time. Without looking at your yjs document or knowing how exactly you're doing changes, it's impossible to know what causes your huge document, but this definitely should not happen.
Have you maybe turned off garbage collection (https://docs.yjs.dev/api/y.doc)?
@janthurau Thanks for your reply. I just log the gc configuration. The default is true, I haven't changed it, and there seems to be no good way to know what is growing. Are there any tools that can analyze YJS documents? So we can see what is taking up so much space
@janthurau Thanks for your reply. I just log the gc configuration. The default is true, I haven't changed it, and there seems to be no good way to know what is growing. Are there any tools that can analyze YJS documents? So we can see what is taking up so much space
@HengCC, You can load it into the new Yjs Playground
@nperez0111 Thanks, using this tool I analyzed the stored data, as mentioned above, the actual content is not large, but I found that there are a lot of clients
in yjs doc, this amount of data is amazing. I'm thinking of ways to eliminate it.
Thanks, using this tool I analyzed the stored data, as mentioned above, the actual content is not large, but I found that there are a lot of
clients
in yjs doc, this amount of data is amazing. I'm thinking of ways to eliminate it.
Hello, is there any good way to deal with these clients?
@huanghantao
I don't have a good way to deal with it right now. You can go to the YJS community and ask. But I'm trying a possible solution. This is provided that you allow the history of these clients to be discarded. In my scene. All I really want is a final copy of the document. The process of collaboration is not really matter. I just need to make a regular copy backup. I'm going to construct a new ydoc before persisting, and then merge the currentState
of the current ydoc. However, I am not sure whether the client information can be discarded in this way. You can also try it.
Just weighing in on how I deal with this. We store both the Yjs CRDT and a JSON snapshot of the data at the tip of that.
After a certain period of inactivity we archive the Yjs CRDT, expiring it. The next time its requested we'll create a fresh CRDT from the JSON snapshot of the data (with no history).
We store a generation number in the CRDT that gets bumped everytime it's recreated. The client sends this generation number when connecting. If their generation does not match the copy on the server (or there is none on the server because it expired), the client is told to discard their local copy of the CRDT and resync with the server with an empty document.
The downside is that anyone who has unsynced offline changes prior to the point of expiry will lose those changes. They must discard them, they cannot be synced. We think this is a fair trade off and is why we expire only after a long enough period of inactivity.
In the existing collaborative environment, documents are stored as binary strings in the database. But I don't know why. When the number of collaborators grows. Even if the document has not actually changed. This binary document also keeps getting bigger in some way. And increasingly uncontrollable. For example, the document in the screenshot below. The text format of the original content is about 5KB. However, after many people edited at the same time, the original content did not change. But the binary content is a staggering 15MB. This is obviously disastrous. As a result, the document takes longer to load, and the size will continue to increase. Is there any way to avoid such a meaningless increase in size?