redgeoff / delta-pouch

Conflict-free collaborative editing for PouchDB
196 stars 13 forks source link

revs #32

Closed calvinmetcalf closed 9 years ago

calvinmetcalf commented 9 years ago

feature request: require revs, currently as this is I wouldn't want to use it in production because if there are other edits being made they can over write each other in unpredictable ways, an even better way to do it might be to have it only have it throw a conflict if the fields we are modifying have been updated since the rev I supplied. The second part is less important then the first because think about

you put in a document

{
   "location": "home",
    "activity": "reading"
}

then you update it

{
    "location": "bar",
     "activity": "getting drunk"
}

but if somewhat was to delta pouch to express that they are reading at the library having seen the first one and not the second they would change it to

{
    "location": "library",
     "activity": "getting drunk"
}
redgeoff commented 9 years ago

Thanks for the feedback, but unfortunately I don't think I understand enough about the way pouch/couch handles revs to visualize what you are suggesting. In my mind a timeline might look like this (in chronological order):

clients 1 & 2 are offline

client 1:

{ 'location': 'home'}

client 2:

{ 'location': 'bar', 'activity': 'getting drunk'}

client 1:

{ 'activity' : 'reading'}

clients 1 & 2 come back online and sync

Yes, the result would be the latest changes, e.g.

{ 'location': 'bar', 'activity': 'reading'}

but I consider this the expected behavior in a collaborative environment where we honor the latest changes. In reality, this type of thing probably wouldn't happen much because if say our docs were actions, it wouldn't make much sense for a user to completely change the details of the action to something else, e.g. I would expect flows more like

{ 'location': 'bar' }
{ 'activity': 'having a drink'}
{ 'activity': 'getting drunk'}

I'd love to hear your opinion if you think differently.

calvinmetcalf commented 9 years ago

a couple things to note, they don't have to be both offline they can both be syncing their changes to a 3rd database

the whole _rev thing is for this situation, basically

client 1:

{ 'location': 'home',  'activity': 'reading'}

server: this is rev 1 of the document

client 2: I have rev 1 please update it to now be:

{ 'location': 'bar', 'activity': 'getting drunk'}

server: ok this is rev 2

client 1: I have rev 1 please update it like this

{ 'location' : 'library'}

server: woah that's no longer the current rev the document has changed since then.

this way you know your updating what you mean to be updating even if the document were to change between when you sent the request and the database received it

redgeoff commented 9 years ago

This definitely has me thinking... what if I kept track of the latest rev for each attribute? But then how should the following play out?

client 1 & 2 offline
client1: { name: 'John Smith'}
client2: { name: 'Jack Brown' }
client1: { telephone: '646-11-SMITH' }
client2: { email: 'brown@example.com' }
client 1 syncs
client 2 syncs

should the result be:

{ name: 'John Smith', telephone: '646-11-SMITH', email: 'brown@example.com' }

or

{ name: 'John Smith', telephone: '646-11-SMITH' }
calvinmetcalf commented 9 years ago

You could instead of deleting field's replace them with $deleted so of there is something in the field of takes precedence over nothing, but removing a field is treated like a value On Aug 23, 2014 4:46 PM, "Geoff Cox" notifications@github.com wrote:

This definitely has me thinking... what if I kept track of the latest rev for each attribute? But then how should the following play out?

client 1 & 2 offlineclient1: { name: 'John Smith'}client2: { name: 'Jack Brown' }client1: { telephone: '646-11-SMITH' }client2: { email: 'brown@example.com' }client 1 syncsclient 2 syncs

should the result be:

{ name: 'John Smith', telephone: '646-11-SMITH', email: 'brown@example.com' }

or

{ name: 'John Smith', telephone: '646-11-SMITH' }

— Reply to this email directly or view it on GitHub https://github.com/redgeoff/delta-pouch/issues/32#issuecomment-53165080.

redgeoff commented 9 years ago

Sure, but that's not the detail I'm worried about. Instead, my point is essentially whether the revs need to be tracked at the individual attribute layer or at the doc layer, e.g. in my example would you expect the end merged result to be

{ name: 'John Smith', telephone: '646-11-SMITH', email: 'brown@example.com' }

OR

{ name: 'John Smith', telephone: '646-11-SMITH' }
calvinmetcalf commented 9 years ago

So what you could do is that if the rev is not current, fetch the old rev (if it's available) and see which fields have changed so only reject then if they did On Aug 23, 2014 6:05 PM, "Geoff Cox" notifications@github.com wrote:

Sure, but that's not the detail I'm worried about. Instead, my point is essentially whether the revs need to be tracked at the individual attribute layer or at the doc layer, e.g. in my example would you expect the end merged result to be

{ name: 'John Smith', telephone: '646-11-SMITH', email: 'brown@example.com' }

OR

{ name: 'John Smith', telephone: '646-11-SMITH' }

— Reply to this email directly or view it on GitHub https://github.com/redgeoff/delta-pouch/issues/32#issuecomment-53167754.

redgeoff commented 9 years ago

yes, but I'm still trying to determine either A: { name: 'John Smith', telephone: '646-11-SMITH', email: ' brown@example.com' } OR B: { name: 'John Smith', telephone: '646-11-SMITH' }

On Sat, Aug 23, 2014 at 3:38 PM, Calvin Metcalf notifications@github.com wrote:

So what you could do is that if the rev is not current, fetch the old rev (if it's available) and see which fields have changed so only reject then if they did On Aug 23, 2014 6:05 PM, "Geoff Cox" notifications@github.com wrote:

Sure, but that's not the detail I'm worried about. Instead, my point is essentially whether the revs need to be tracked at the individual attribute layer or at the doc layer, e.g. in my example would you expect the end merged result to be

{ name: 'John Smith', telephone: '646-11-SMITH', email: ' brown@example.com' }

OR

{ name: 'John Smith', telephone: '646-11-SMITH' }

— Reply to this email directly or view it on GitHub https://github.com/redgeoff/delta-pouch/issues/32#issuecomment-53167754.

— Reply to this email directly or view it on GitHub https://github.com/redgeoff/delta-pouch/issues/32#issuecomment-53168797.

calvinmetcalf commented 9 years ago

Good point, yeah just throw an error if the document has changed don't add the inconsistent email if the name has changed On Aug 23, 2014 6:41 PM, "Geoff Cox" notifications@github.com wrote:

yes, but I'm still trying to determine either A: { name: 'John Smith', telephone: '646-11-SMITH', email: ' brown@example.com' } OR B: { name: 'John Smith', telephone: '646-11-SMITH' }

On Sat, Aug 23, 2014 at 3:38 PM, Calvin Metcalf notifications@github.com

wrote:

So what you could do is that if the rev is not current, fetch the old rev (if it's available) and see which fields have changed so only reject then if they did On Aug 23, 2014 6:05 PM, "Geoff Cox" notifications@github.com wrote:

Sure, but that's not the detail I'm worried about. Instead, my point is essentially whether the revs need to be tracked at the individual attribute layer or at the doc layer, e.g. in my example would you expect the end merged result to be

{ name: 'John Smith', telephone: '646-11-SMITH', email: ' brown@example.com' }

OR

{ name: 'John Smith', telephone: '646-11-SMITH' }

— Reply to this email directly or view it on GitHub < https://github.com/redgeoff/delta-pouch/issues/32#issuecomment-53167754>.

— Reply to this email directly or view it on GitHub https://github.com/redgeoff/delta-pouch/issues/32#issuecomment-53168797.

— Reply to this email directly or view it on GitHub https://github.com/redgeoff/delta-pouch/issues/32#issuecomment-53168891.

redgeoff commented 9 years ago

So, that's the point. If any change invalidates a rev for the entire doc then what's the point of delta pouch?

redgeoff commented 9 years ago

I think the key may be to leave delta pouch the way it is and expect the UI to communicate the conflicts.

nolanlawson commented 9 years ago

@calvinmetcalf I think the goal of delta-pouch is to achieve better performance by making exactly the sacrifice you describe – conflicts can no longer be handled in a Couch-like way. In fact delta-pouch will never throw a 409 if I understand correctly.

Fetching the rev wouldn't even solve the problem, really, because two databases could still get out of sync. What if two clients are offline? They can fetch the "latest rev," but each one sees their own version of that, so ultimately they will push conflicts to the database. In delta-pouch's system, it's just a basic "last write wins."

The idea is something like what @wohali describes at the end of this blog post or as summed up by Accountants Don't Use Erasers. You basically treat CouchDB/PouchDB as a big append-only storage system. It doesn't work for every use case, but when it does work, you can get great performance benefits – e.g. all documents are generation 1, so we can always use the generation-1 optimization during replication.

calvinmetcalf commented 9 years ago

Yeah I was wrong about fetching specific revs. But we still have to fetch the old version so we can at least check the rev, revs while annoying are a feature not a bug in couch/pouch and that becomes more important when you are doing diffs, this is less of a case of 2 off line clients as opposed to 2 online local dbs both replicating to a remote database. We want to avoid situations where the doc you are editing gets updates out from under you and you just overwrite the change. On Sep 7, 2014 12:07 PM, "Nolan Lawson" notifications@github.com wrote:

@calvinmetcalf https://github.com/calvinmetcalf I think the goal of delta-pouch is to achieve better performance by making exactly the sacrifice you describe – conflicts can no longer be handled in a Couch-like way. In fact delta-pouch will never throw a 409 if I understand correctly.

Fetching the rev wouldn't even solve the problem, really, because two databases could still get out of sync. What if two clients are offline? They can fetch the "latest rev," but each one sees their own version of that, so ultimately they will push conflicts to the database. In delta-pouch's system, it's just a basic "last write wins."

The idea is something like what @wohali https://github.com/wohali describes at the end of this blog post http://atypical.net/archive/2014/04/17/understanding-race-induced-conflicts-in-bigcouch or as summed up by Accountants Don't Use Erasers http://blogs.msdn.com/b/pathelland/archive/2007/06/14/accountants-don-t-use-erasers.aspx. You basically treat CouchDB/PouchDB as a big append-only storage system. It doesn't work for every use case, but when it does work, you can get great performance benefits – e.g. all documents are generation 1, so we can always use the generation-1 optimization during replication.

— Reply to this email directly or view it on GitHub https://github.com/redgeoff/delta-pouch/issues/32#issuecomment-54751219.

nolanlawson commented 9 years ago

I agree that revs are a feature, not a bug (although many users would disagree...), but in this case the whole revision management system is being sidestepped for the benefit of speed and of never having to deal with conflicts in the CouchDB sense (i.e. _conflicts).

Of course conflicts can still happen, but as @redgeoff says, that's up to application developers to handle.

Maybe what this plugin could have is a feature like this:

pouch.setMergeStrategy(function (oldDoc, newDoc) {
  // Do some merge logic here. The default is
  // just extend(true, oldDoc, newDoc)
});
redgeoff commented 9 years ago

@nolanlawson thanks for joining this discussion! I'm still not sure that anything should be changed in delta-pouch as there appears to be tradeoffs with considering the rev. Can you further explain your setMergeStrategy idea? How would it differ from what is already implemented in delta-pouch?

calvinmetcalf commented 9 years ago

PouchDB or couchdb without revs already exists it's called levelup. You lose most of the benefits of pouch or couch when you don't use them.

The merge idea makes me think,

I could be approaching this wrong, what if delta pouch took an ID and a function, the function would describe the change you want to make to the document.

E. G. Increment this field or change this one to that if it is this.

Delta pouch could generate a conflict if the doc gets deleted or the function errors, in the case of a conflict after the update happens it just discards the change and reruns the change function with the new doc

deltaPouch(id, function(doc, done) {

doc.value++;

next(null, doc);

}

@nolanlawson https://github.com/nolanlawson thanks for joining this discussion! I'm still not sure that anything should be changed in delta-pouch as there appears to be tradeoffs with considering the rev. Can you further explain your setMergeStrategy idea? How would it differ from what is already implemented in delta-pouch?

— Reply to this email directly or view it on GitHub https://github.com/redgeoff/delta-pouch/issues/32#issuecomment-54764111.

nolanlawson commented 9 years ago

Can you further explain your setMergeStrategy idea?

It would basically give people an option to override your merge function, in case they want to do something fancier.

PouchDB or couchdb without revs already exists it's called levelup

heh

You lose most of the benefits of pouch or couch when you don't use them.

Not true; we still have sync. And in fact, sync is much faster using the delta-pouch pattern.

The merge idea makes me think

The setMergeStrategy would still cover your use case. It would be e.g.

db.setMergeStrategy(function (oldDoc, newDoc) {
  return {
    value: oldDoc.value + newDoc.value;
  };
});

We have created two APIs to describe the same concept, which means the concept is probably a good one. :)

calvinmetcalf commented 9 years ago

I'm not saying use case I'm saying primary api, and fyi sync without revs isn't very useful it's just asking for race conditions On Sep 7, 2014 7:44 PM, "Nolan Lawson" notifications@github.com wrote:

Can you further explain your setMergeStrategy idea?

It would basically give people an option to override your merge function https://github.com/redgeoff/delta-pouch/blob/master/index.js#L42-L51, in case they want to do something fancier.

PouchDB or couchdb without revs already exists it's called levelup

heh

You lose most of the benefits of pouch or couch when you don't use them.

Not true; we still have sync. And in fact, sync is much faster using the delta-pouch pattern.

The merge idea makes me think

The setMergeStrategy would still cover your use case. It would be e.g.

db.setMergeStrategy(function (oldDoc, newDoc) { return { value: oldDoc.value + newDoc.value; };});

We have create two APIs to describe the same concept, which means the concept is probably a good one. :)

— Reply to this email directly or view it on GitHub https://github.com/redgeoff/delta-pouch/issues/32#issuecomment-54765279.

wohali commented 9 years ago

just mentioning that sync with a merge function can lead to endless loops as well...see http://mail-archives.apache.org/mod_mbox/couchdb-user/201408.mbox/%3C659CBD44-E8FD-4F34-B7D8-16AE6BFCA225%40apache.org%3E and the rest of the thread for a recent discussion on this over in couchdb land about the problem of resolving conflicts. I haven't read through this proposal in very close detail but it's worth considering all the ways in which this could possibly fail.

redgeoff commented 9 years ago

Closing in favor of ensuring delta-pouch is conflict-free.