hoodiehq / discussion

General discussions and questions about Hoodie
7 stars 1 forks source link

Data Versioning and Revisions #41

Open davidpfahler opened 10 years ago

davidpfahler commented 10 years ago

Hi team Hoodie (:

inspired by some great experiences with Hoodie, the team and the community, I am thinking a lot about what one can create with it. Hoodie is about "very fast app development" and I keep wondering what kind of apps are possible with its approach (and which aren't).

One feature of modern apps that is growing in popularity and adoption is the ability to manage different versions of a piece of data. An example would be Google Docs, which is creating a "Revision History" by creation a revision on every save, but also Dropbox (file-based).

@gr2m explained to me that CouchDBs revisions could not be used for reliable versioning. Also, the API, e.g. hoodie.store.find does not seem to anticipate or in any way handle this issue. Hence, I would like to start a discussion whether Hoodie wants to support versioning and if so, how the API should look like.

To foster the discussion, I'd like to present a thought experiment: If Hoodie (core) decided not to handle versioning, could one create a versioning system on the client-side? To accomplish this, one might handle changes to the data in the app and provide hoodie with a big array containing all versions. Hence, whenever your app would consider something a new version, that new object would be pushed onto that array and hoodie.start.update would be called with that array. In my opinion, this creates a layer of data management between the app and Hoodie. This was precisely what I would like to avoid and the main reason I would use Hoodie. It does not seem like a good option to me.

Below is my "Dreamcode" from a Frontend Dev perspective. It's just an idea, it might not even be possible, but I hope it is at least worth discussing. hoodie.store.find currently takes a type and an id argument. I propose a third argument version which is an int. It defaults to 0, meaning the most recent version. 1 would mean second last version and so on. localStore.save could implement this by increasing this version count by one. I have no idea how that would be implemented on the backend or with CouchDB, I hope someone can chime in on that.

I hope my feedback is relevant and I would be happy to help moving this discussion and possibly an implementation forward.

Best, David

gr2m commented 10 years ago

Thanks a lot for putting that together David! I've no idea how that would be possible, but I want that, too :)

What my understanding is that you want is to be able revert to earlier states of the current document, right? Maybe instead of (or additionally to) version numbers, a time stamp / date object would work?

// revert last change
hoodie.store.revert('document', id) 
// revert to to specific datetime
hoodie.store.revert('document', id).to(yesterdayUTCTimestamp)
// or with version numbers (revert to version 4)
hoodie.store.revert('document', id).to(4)
// revert a number of objects
promise = hoodie.store.findAll(tasklistAndTaskFilter);
hoodie.store.revert(promise)

I guess it would be also nice to see all available versions, too. There could be version numbers that would need to be set explicitely, and then automated versions if version option is not set

// always creates a version
hoodie.store.add('document', properties);
// creates implicit version (accessible with timestamps)
hoodie.store.update('document', id, changedProperties);
// creates explicit new version
hoodie.store.update('document', id, changedProperties, {newVersion: true});

Note that reverting can happening on both, frontend and backend. For example, hoodie.store.revert('document', id), if not found locally, can create an internal task that gets picked up by the backend, that looks into a special store with all deleted objects or what not, and then resolves or rejects the tasks, which than populates back to the frontend where the promise will be resolved / rejected accordingly.

janl commented 10 years ago

+1, this would be fab plugin.

Just an implementation note: In CouchDB this would be done using attachments. E.g. each writing of a new version would at the same time attach the current version to the new version as an attachment. You would want to have some garbage collection somewhere, so you don’t end up with infinite versions, but maybe you do want that. That could all be done entirely client-side, as well.

davidpfahler commented 10 years ago

I can't think about this issue without thinking about git. It's where I'm most familiar with the concept of versioning. As Hoodie is about multiple instances of the data and sync, I think reverting, in the sense that you go back to an earlier version and discard the newer ones, is a bad idea (at least how I'm thinking about it). It's analogous to reset --hard and force pushing. I'm not implying that's what you, @gr2m, want to do, I just want to make it clear that this, I think, isn't a good strategy regarding sync.

Hence, I would focus more on retrieving earlier versions. Because if you can do that, reverting is as simple as getting the earlier version and calling hoodie.store.update with that earlier version's data.

function revert() {
  var secondLastVersion = hoodie.store.find('customer', 1234, 1);
  hoodie.store.update('customer', 1234, secondLastVersion);
}

This would implicitly create a new version but with the content of this older version. It's like git-revert; from the git revert --help:

Given one or more existing commits, revert the changes that the related patches introduce, and record some new commits that record them.

I hope this can demonstrate that reverting isn't the main use case, although nice to have. I'm thinking about, for example, showing diffs of versions side by side or overlayed on top of each other. That's only possible if I can get any version of any document at any time (without reverting). What is needed is a more fundamental API on top of which other cool functions, such as revert, can be built.

As far as I understand you, @janl, you wouldn't consider this for a core feature? I'd like to ask you to give it a second thought. I think it is much harder to patch this afterwards using a plugin than thinking about how we can do this from the start. I think there might be a very cheap way to get this without uglifying the API or destroying the simplicity of the architecture. I'll think about it some more and report back if and when I come up with an elegant solution. At the very least, we should think how we can make this as easy as possible for a plugin.

Thanks for both of your comments and great advice. (:

janl commented 10 years ago

+1 on not using the git reset --hard equivalent.

As far as core features go: we are trying VERY hard to keep the core lean and flexible enough to have things like this be easy plugins. If that can’t be done, we need to drill open the core a little more.

Another angle is, that even today the user and email plugins are, well, plugins, even while we consider them part of the core product we ship, they are not part of the core software that we develop. A versioning plugin may as well be a default plugin at some point, but we’d need to see about that.

tl;dr: don’t mind me, keep going!

davidpfahler commented 10 years ago

I talked to @ehd and others about this and this is what we came up with:

Before changes are saved, we create a copy of the current (old) data. The data scheme needs to include a version property. If not we think of it as the first version (and add it). We write this data to S3 (or somewhere) using a predictable file name scheme. Getting a version is as simple as requesting the file from S3. Reverting to a previous version is a simply as getting the version, setting the version property to latest + 1 and save.

Does that make sense? The only issue I see implementing this as a plugin would be to get the hook to inject the plugin right before save. Ideas?