Closed calvinmetcalf closed 9 years ago
Thanks for the feedback, but unfortunately I don't think I understand enough about the way pouch/couch handles revs to visualize what you are suggesting. In my mind a timeline might look like this (in chronological order):
clients 1 & 2 are offline
client 1:
{ 'location': 'home'}
client 2:
{ 'location': 'bar', 'activity': 'getting drunk'}
client 1:
{ 'activity' : 'reading'}
clients 1 & 2 come back online and sync
Yes, the result would be the latest changes, e.g.
{ 'location': 'bar', 'activity': 'reading'}
but I consider this the expected behavior in a collaborative environment where we honor the latest changes. In reality, this type of thing probably wouldn't happen much because if say our docs were actions
, it wouldn't make much sense for a user to completely change the details of the action
to something else, e.g. I would expect flows more like
{ 'location': 'bar' }
{ 'activity': 'having a drink'}
{ 'activity': 'getting drunk'}
I'd love to hear your opinion if you think differently.
a couple things to note, they don't have to be both offline they can both be syncing their changes to a 3rd database
the whole _rev thing is for this situation, basically
client 1:
{ 'location': 'home', 'activity': 'reading'}
server: this is rev 1 of the document
client 2: I have rev 1 please update it to now be:
{ 'location': 'bar', 'activity': 'getting drunk'}
server: ok this is rev 2
client 1: I have rev 1 please update it like this
{ 'location' : 'library'}
server: woah that's no longer the current rev the document has changed since then.
this way you know your updating what you mean to be updating even if the document were to change between when you sent the request and the database received it
This definitely has me thinking... what if I kept track of the latest rev for each attribute? But then how should the following play out?
client 1 & 2 offline
client1: { name: 'John Smith'}
client2: { name: 'Jack Brown' }
client1: { telephone: '646-11-SMITH' }
client2: { email: 'brown@example.com' }
client 1 syncs
client 2 syncs
should the result be:
{ name: 'John Smith', telephone: '646-11-SMITH', email: 'brown@example.com' }
or
{ name: 'John Smith', telephone: '646-11-SMITH' }
You could instead of deleting field's replace them with $deleted so of there is something in the field of takes precedence over nothing, but removing a field is treated like a value On Aug 23, 2014 4:46 PM, "Geoff Cox" notifications@github.com wrote:
This definitely has me thinking... what if I kept track of the latest rev for each attribute? But then how should the following play out?
client 1 & 2 offlineclient1: { name: 'John Smith'}client2: { name: 'Jack Brown' }client1: { telephone: '646-11-SMITH' }client2: { email: 'brown@example.com' }client 1 syncsclient 2 syncs
should the result be:
{ name: 'John Smith', telephone: '646-11-SMITH', email: 'brown@example.com' }
or
{ name: 'John Smith', telephone: '646-11-SMITH' }
— Reply to this email directly or view it on GitHub https://github.com/redgeoff/delta-pouch/issues/32#issuecomment-53165080.
Sure, but that's not the detail I'm worried about. Instead, my point is essentially whether the revs need to be tracked at the individual attribute layer or at the doc layer, e.g. in my example would you expect the end merged result to be
{ name: 'John Smith', telephone: '646-11-SMITH', email: 'brown@example.com' }
OR
{ name: 'John Smith', telephone: '646-11-SMITH' }
So what you could do is that if the rev is not current, fetch the old rev (if it's available) and see which fields have changed so only reject then if they did On Aug 23, 2014 6:05 PM, "Geoff Cox" notifications@github.com wrote:
Sure, but that's not the detail I'm worried about. Instead, my point is essentially whether the revs need to be tracked at the individual attribute layer or at the doc layer, e.g. in my example would you expect the end merged result to be
{ name: 'John Smith', telephone: '646-11-SMITH', email: 'brown@example.com' }
OR
{ name: 'John Smith', telephone: '646-11-SMITH' }
— Reply to this email directly or view it on GitHub https://github.com/redgeoff/delta-pouch/issues/32#issuecomment-53167754.
yes, but I'm still trying to determine either A: { name: 'John Smith', telephone: '646-11-SMITH', email: ' brown@example.com' } OR B: { name: 'John Smith', telephone: '646-11-SMITH' }
On Sat, Aug 23, 2014 at 3:38 PM, Calvin Metcalf notifications@github.com wrote:
So what you could do is that if the rev is not current, fetch the old rev (if it's available) and see which fields have changed so only reject then if they did On Aug 23, 2014 6:05 PM, "Geoff Cox" notifications@github.com wrote:
Sure, but that's not the detail I'm worried about. Instead, my point is essentially whether the revs need to be tracked at the individual attribute layer or at the doc layer, e.g. in my example would you expect the end merged result to be
{ name: 'John Smith', telephone: '646-11-SMITH', email: ' brown@example.com' }
OR
{ name: 'John Smith', telephone: '646-11-SMITH' }
— Reply to this email directly or view it on GitHub https://github.com/redgeoff/delta-pouch/issues/32#issuecomment-53167754.
— Reply to this email directly or view it on GitHub https://github.com/redgeoff/delta-pouch/issues/32#issuecomment-53168797.
Good point, yeah just throw an error if the document has changed don't add the inconsistent email if the name has changed On Aug 23, 2014 6:41 PM, "Geoff Cox" notifications@github.com wrote:
yes, but I'm still trying to determine either A: { name: 'John Smith', telephone: '646-11-SMITH', email: ' brown@example.com' } OR B: { name: 'John Smith', telephone: '646-11-SMITH' }
On Sat, Aug 23, 2014 at 3:38 PM, Calvin Metcalf notifications@github.com
wrote:
So what you could do is that if the rev is not current, fetch the old rev (if it's available) and see which fields have changed so only reject then if they did On Aug 23, 2014 6:05 PM, "Geoff Cox" notifications@github.com wrote:
Sure, but that's not the detail I'm worried about. Instead, my point is essentially whether the revs need to be tracked at the individual attribute layer or at the doc layer, e.g. in my example would you expect the end merged result to be
{ name: 'John Smith', telephone: '646-11-SMITH', email: ' brown@example.com' }
OR
{ name: 'John Smith', telephone: '646-11-SMITH' }
— Reply to this email directly or view it on GitHub < https://github.com/redgeoff/delta-pouch/issues/32#issuecomment-53167754>.
— Reply to this email directly or view it on GitHub https://github.com/redgeoff/delta-pouch/issues/32#issuecomment-53168797.
— Reply to this email directly or view it on GitHub https://github.com/redgeoff/delta-pouch/issues/32#issuecomment-53168891.
So, that's the point. If any change invalidates a rev for the entire doc then what's the point of delta pouch?
I think the key may be to leave delta pouch the way it is and expect the UI to communicate the conflicts.
@calvinmetcalf I think the goal of delta-pouch is to achieve better performance by making exactly the sacrifice you describe – conflicts can no longer be handled in a Couch-like way. In fact delta-pouch will never throw a 409 if I understand correctly.
Fetching the rev wouldn't even solve the problem, really, because two databases could still get out of sync. What if two clients are offline? They can fetch the "latest rev," but each one sees their own version of that, so ultimately they will push conflicts to the database. In delta-pouch's system, it's just a basic "last write wins."
The idea is something like what @wohali describes at the end of this blog post or as summed up by Accountants Don't Use Erasers. You basically treat CouchDB/PouchDB as a big append-only storage system. It doesn't work for every use case, but when it does work, you can get great performance benefits – e.g. all documents are generation 1, so we can always use the generation-1 optimization during replication.
Yeah I was wrong about fetching specific revs. But we still have to fetch the old version so we can at least check the rev, revs while annoying are a feature not a bug in couch/pouch and that becomes more important when you are doing diffs, this is less of a case of 2 off line clients as opposed to 2 online local dbs both replicating to a remote database. We want to avoid situations where the doc you are editing gets updates out from under you and you just overwrite the change. On Sep 7, 2014 12:07 PM, "Nolan Lawson" notifications@github.com wrote:
@calvinmetcalf https://github.com/calvinmetcalf I think the goal of delta-pouch is to achieve better performance by making exactly the sacrifice you describe – conflicts can no longer be handled in a Couch-like way. In fact delta-pouch will never throw a 409 if I understand correctly.
Fetching the rev wouldn't even solve the problem, really, because two databases could still get out of sync. What if two clients are offline? They can fetch the "latest rev," but each one sees their own version of that, so ultimately they will push conflicts to the database. In delta-pouch's system, it's just a basic "last write wins."
The idea is something like what @wohali https://github.com/wohali describes at the end of this blog post http://atypical.net/archive/2014/04/17/understanding-race-induced-conflicts-in-bigcouch or as summed up by Accountants Don't Use Erasers http://blogs.msdn.com/b/pathelland/archive/2007/06/14/accountants-don-t-use-erasers.aspx. You basically treat CouchDB/PouchDB as a big append-only storage system. It doesn't work for every use case, but when it does work, you can get great performance benefits – e.g. all documents are generation 1, so we can always use the generation-1 optimization during replication.
— Reply to this email directly or view it on GitHub https://github.com/redgeoff/delta-pouch/issues/32#issuecomment-54751219.
I agree that revs are a feature, not a bug (although many users would disagree...), but in this case the whole revision management system is being sidestepped for the benefit of speed and of never having to deal with conflicts in the CouchDB sense (i.e. _conflicts
).
Of course conflicts can still happen, but as @redgeoff says, that's up to application developers to handle.
Maybe what this plugin could have is a feature like this:
pouch.setMergeStrategy(function (oldDoc, newDoc) {
// Do some merge logic here. The default is
// just extend(true, oldDoc, newDoc)
});
@nolanlawson thanks for joining this discussion! I'm still not sure that anything should be changed in delta-pouch as there appears to be tradeoffs with considering the rev. Can you further explain your setMergeStrategy
idea? How would it differ from what is already implemented in delta-pouch?
PouchDB or couchdb without revs already exists it's called levelup. You lose most of the benefits of pouch or couch when you don't use them.
The merge idea makes me think,
I could be approaching this wrong, what if delta pouch took an ID and a function, the function would describe the change you want to make to the document.
E. G. Increment this field or change this one to that if it is this.
Delta pouch could generate a conflict if the doc gets deleted or the function errors, in the case of a conflict after the update happens it just discards the change and reruns the change function with the new doc
deltaPouch(id, function(doc, done) {
doc.value++;
next(null, doc);
}
@nolanlawson https://github.com/nolanlawson thanks for joining this discussion! I'm still not sure that anything should be changed in delta-pouch as there appears to be tradeoffs with considering the rev. Can you further explain your setMergeStrategy idea? How would it differ from what is already implemented in delta-pouch?
— Reply to this email directly or view it on GitHub https://github.com/redgeoff/delta-pouch/issues/32#issuecomment-54764111.
Can you further explain your
setMergeStrategy
idea?
It would basically give people an option to override your merge function, in case they want to do something fancier.
PouchDB or couchdb without revs already exists it's called levelup
heh
You lose most of the benefits of pouch or couch when you don't use them.
Not true; we still have sync. And in fact, sync is much faster using the delta-pouch pattern.
The merge idea makes me think
The setMergeStrategy
would still cover your use case. It would be e.g.
db.setMergeStrategy(function (oldDoc, newDoc) {
return {
value: oldDoc.value + newDoc.value;
};
});
We have created two APIs to describe the same concept, which means the concept is probably a good one. :)
I'm not saying use case I'm saying primary api, and fyi sync without revs isn't very useful it's just asking for race conditions On Sep 7, 2014 7:44 PM, "Nolan Lawson" notifications@github.com wrote:
Can you further explain your setMergeStrategy idea?
It would basically give people an option to override your merge function https://github.com/redgeoff/delta-pouch/blob/master/index.js#L42-L51, in case they want to do something fancier.
PouchDB or couchdb without revs already exists it's called levelup
heh
You lose most of the benefits of pouch or couch when you don't use them.
Not true; we still have sync. And in fact, sync is much faster using the delta-pouch pattern.
The merge idea makes me think
The setMergeStrategy would still cover your use case. It would be e.g.
db.setMergeStrategy(function (oldDoc, newDoc) { return { value: oldDoc.value + newDoc.value; };});
We have create two APIs to describe the same concept, which means the concept is probably a good one. :)
— Reply to this email directly or view it on GitHub https://github.com/redgeoff/delta-pouch/issues/32#issuecomment-54765279.
just mentioning that sync with a merge function can lead to endless loops as well...see http://mail-archives.apache.org/mod_mbox/couchdb-user/201408.mbox/%3C659CBD44-E8FD-4F34-B7D8-16AE6BFCA225%40apache.org%3E and the rest of the thread for a recent discussion on this over in couchdb land about the problem of resolving conflicts. I haven't read through this proposal in very close detail but it's worth considering all the ways in which this could possibly fail.
Closing in favor of ensuring delta-pouch is conflict-free.
feature request: require revs, currently as this is I wouldn't want to use it in production because if there are other edits being made they can over write each other in unpredictable ways, an even better way to do it might be to have it only have it throw a conflict if the fields we are modifying have been updated since the rev I supplied. The second part is less important then the first because think about
you put in a document
then you update it
but if somewhat was to delta pouch to express that they are reading at the library having seen the first one and not the second they would change it to