oliver-oloughlin / kvdex

A high-level abstraction layer for Deno KV with zero third-party dependencies by default 🦕🗝️
https://jsr.io/@olli/kvdex
MIT License
192 stars 6 forks source link

Version history helpers? #124

Closed juliendorra closed 9 months ago

juliendorra commented 10 months ago

My understanding is that Deno KV keeps the full history of a given key. (or am I mistaken?)

When using KV directly, we can keep tack of history for example like this: https://deno-blog.com/A_Comprehensive_Guide_to_Deno_KV.2023-06-30#tracking-a-records-history

It would be helpful to have higher level functions in kvdex to list the history of a given entry and to retrieve a previous version of a key.

Would it makes sense, or do you think it's better for devs to rely on their own versioning scheme on top of kvdex/kv?

oliver-oloughlin commented 10 months ago

If the version history is persisted, that's a lot of extra data being stored. I think it then makes more sense for developers to manage that themselves if they need it. I might add it as an optional feature in the future, I think the idea is pretty good.

oliver-oloughlin commented 10 months ago

Btw KV does not persist the history by itself, you still have to manage all that yourself.

oliver-oloughlin commented 10 months ago

I've added version history support in the release (v0.25.1)

juliendorra commented 10 months ago

[got an email notifications with some ideas but it doesn't appears here 🤔]

I'm exploring some ways I could implement version history, but I have a few questions I would like to ask, as it can be done in a few different ways: How should deleting a document effect its history? I see 3 possibilities: It doesn't affect the history at all, it deletes all the history, or it creates a new history entry with a value of null to indicate it was deleted at that time. Should history entries contain the versionstamp of the document, or just a timestamp? Adding the versionstamp introduces extra complexity and delay, while just having a timestamp is much easier and more robust against consistency issues. Right now I'm leaning towards 1) Create an entry with null value to record deletes, 2) Just have timestamp.

What I did for my use case is create a new entry for each version using a common id value (The ID is a custom nanoid generated by the server, because it's user facing and I needed something clean/easy in the URLs). This id is used a secondary index. I also add a timestamp value to each version.

I used ulid, following your advice, for the keys, to be able to sort the entries by insert order, so as to not rely on the timestamps to retrieve the last version or the last 5, etc. Ulid property that its lexicographical order is also a temporal order is great in this case.

So in my case versions are independent entries, and I don't use or care about the versionstamp, as the ulid key gives me order, and the timestamp can give me a precise moment in time.

One issue would be if I wanted to clone the entries to another DB. I would have to make sure to keep the keys (most probably), or recreate keys in order.

If a user would delete a document, I would probably, in my use case, delete all versions. That's because in my case, it would be the expectation of the end user and because my goals is for teams to use it for internal use. I would probably create a grace delay with some mechanism, but would end up deleting them for good.

But I also think that pseudo-deleting by adding a special empty version is an interesting use case. It might even be something a company ask for legal reasons, traceability.

But, I'm not sure at all that pseudo-deleting by adding a special empty version should be managed at kvdex level, it seems something you decide at application level, because you don't want to really delete the versions.

At kvdex level, a delete should be a real delete, as in that case leaving the whole document history would be a very surprising behavior for the developer. And would leave us without a simple way to remove the history at once. [which could be needed for GDPR legal reasons for example. In Eu we have to delete personal data after some time if the user leave or ask]

And if a developer don't really want to delete a document's history, let them hide it in their own way instead! That's high level application stuff in my opinion! 😁

oliver-oloughlin commented 10 months ago

It was my intuition that a version history should persist all states of a document, deleted also being a state. The way I have implemented it differs slightly from my original thinking, where instead of a null or empty value, a history entry has a type property, specifying if the state is a write or a delete. Write entries have a copy of the document value, while delete entries do not.

My reason for doing this is that it enables a wider range of functionality, where every state is accounted for. For example if you were to delete a document, and then later write to that same id, you would be able to see the entire history of that id instead of loosing it. I plan to add some more utility functions to allow for deleting a document's history. Right now history is persisted as long as history is enabled for the collection, and can only be deleted by wiping the entire database (db.wipe())

Edit: I have also considered supporting 2 different variations of version history: "persistent" => like it is implemented currently. "temporary" => where deleting a document also deletes its history.

oliver-oloughlin commented 10 months ago

After some initial attempts it turns out it's actually harder to delete history on deleting a document instead of recording the delete. This is because of the history being a range of multiple entries, requiring every entry to first be read using a list iterator, and then deleting the key for every entry.

Meaning its far less performant than simply keeping the history.

juliendorra commented 10 months ago

If you come from a git perspective, the persistent history fits the mental model, for sure!

a history entry has a type property, specifying if the state is a write or a delete. Write entries have a copy of the document value, while delete entries do not.

I'll test using that! Your implementation might fit my use case better, especially in terms of listing the last versions, listing the document without manually filtering versions, etc.

My only (big) issue is that it is a legal obligation to be able to delete all personal data of an user in EU (GDPR), under the penalty of quite heavy fines, and so I ultimately will need a way to delete things that can be considered personal data for good without deleting the whole db 😅 (By the way as an user, I like the idea in the GDPR that services should keep my data only as long as its needed and delete on demand!)

I hope you'll find a way to allow deleting the whole content's history of a particular document. Maybe there's two use case here:

wipeEntireHistory versus emptyContentHistory maybe?

oliver-oloughlin commented 10 months ago

I'm just going to reopen this issue as there are still some specifics that should be solved.

I see your point about GDPR, it's definitely an important concern to consider. It is now possible to delete a documents history. I'm wondering if it makes sense to also include the possibility of deleting a document's history when deleting the document itself, through an option like deleteHistory: true. But this would change all the delete...() APIs, and would mean a breaking change for delete(), so I'm a little unsure if it's necessary, especially since it's only a slight enhancement to utility, and not a new feature being enabled.

When it comes to keeping just the last delete entry, that shouldn't be too difficult, although I think the API is getting a little cluttered with delete functions now, so I would probably implement it through options for the existing deleteHistory() instead.

juliendorra commented 10 months ago

Yes, it's a bit complex to do cleanly in terms of API because there's 3 different actions that I see a dev doing / needing to do:

  1. just mark a document as deleted = add a delete entry to history
  2. also delete history itself, keeping the last delete entry
  3. wipe all trace of this document existence

Are 1 and 2 the same with an option? It feels like 3 is the one apart?

juliendorra commented 10 months ago

Looking at the current additions, deleteHistory is now doing 3, "wipe all trace of this document existence" right?

Or is it doing 2: "keep one single delete entry as history"?

[in any case I quite like the explicit deleteHistory call, as this is obviously a rare operation that you want to control]

oliver-oloughlin commented 10 months ago

Right now it is doing 3, wipe all trace of this document's existence. 1 is basically done as well whenever a delete operation is performed, a new history entry with type "delete" is added, without deleting any previous history entries (but deleting the actual document entry of course).

juliendorra commented 10 months ago

I'm in the process of changing my version code to use kvdex history 😃😅

One thing is that I now can't limit the history results. In my case, an user can create dozens of entries in just an hour of playing with the web app, and I just want to retrieve / show them the last 5 at a time.

Any start - end - limit options would of course help as history become bigger for navigating it and then presenting it to the user. For now I will filter the whole history.

PS: I don't know how early/low level Deno KV allow pagination and limits in the request? Of course ideally if the non-needed history entries never leave the db, it's of course the best, but as Deno KV is quite simple, maybe it is not that helpful?

oliver-oloughlin commented 10 months ago

KV allows for cursor pagination. So I can definitely implement that just like with all the xxxMany() operations. With all of the list options added it should be pretty easy to get the last 5 entries by using the options: { reverse: true, limit 5 }

I'll provide an update in this issue thread when I have some working implementation.

oliver-oloughlin commented 10 months ago

findHistory() now supports pagination and filtering in the latest release (v0.27.0). Note that this also brings a breaking change to the return type of the method, by making it follow the same signature as other pagination-supported methods that return a result and cursor.

Please let me know if any additional related issues were to arise, otherwise I think I will deem this feature as complete for the time being.