pouchdb / upsert

PouchDB plugin for upsert() and putIfNotExists() functions
Apache License 2.0
149 stars 25 forks source link

remove _rev property from docs & fix doc mutation #3

Closed notslang closed 9 years ago

notslang commented 9 years ago

closes #2 and makes some improvements to the readme. this would require a major version bump.

nolanlawson commented 9 years ago

I tend to oppose this commit for a few reasons:

I totally understand where you're coming from with wanting to do your own deep comparison and not have to worry about _rev, but I think the structured cloning fix along with delete doc._rev inside of your diffFun is a fine solution to that. So I'd like to merge your cloning fixes and README fixes, but not the other stuff. I'll propose a commit on top of yours this morning.

notslang commented 9 years ago

For doing a particular action if the document exists vs. if it doesn't, couldn't you just check if doc._id exists?

nolanlawson commented 9 years ago

Yes, but to me, it's clearer to check the _rev, since that's what manages the revision history.

notslang commented 9 years ago

Alright, I added a test to demonstrate the 2nd kind of doc mutation that the _.clone is there to eliminate.

As for the removal of _rev, I find it clearer to check for the existence of _id because I'm asking if the doc exists, not if the doc has history... Of course, that's just semantics and either way would be understandable. Also, I might just be used to _id from my years using mongo.

But, if we got rid of _rev then there is a tangible benefit: it would be easier to compare new docs with old docs using _.isEqual, which is something I find myself doing quite often.

Also, it does seem a little weird to have a property that the mutation of does nothing to change the result. Of course. this is true of _id too, but I don't think we could get rid of that.

nolanlawson commented 9 years ago

That's a good point, however I think your 'should not mutate local vars' test can be fixed without omitting the _revs. I understand that you want to do deep comparisons, but I'd like to optimize for performance and clarity here (ideally you shouldn't even need to deep comparison/cloning), and I just can't imagine that users will find it intuitive that the input doc is not identical to the one they would get from get(). It also removes the possibility that users can write functions that, for instance, react differently for first-gen docs 1- vs second-gen docs 2-.

I understand where you're coming from, but I think I'm going to be that jerk and close this pull request in favor of the other solution. I encourage you to submit a pull request to fix the 'should not mutate local vars' test, or if you don't mind I can simply borrow your test and write my own fix (happy to give you credit in the commit log, of course; I can change the author to you).

Thanks for your input!

nolanlawson commented 9 years ago

Foillow-up: https://github.com/pouchdb/upsert/issues/7

notslang commented 9 years ago

Yeah, feel free to use the test case (as well as any other code you want).

And yeah clone can be used without omitting _rev, in fact that test should still work in exactly the same way if _rev is included in doc since it only checks doc.value !== newDoc.value, and so long as newDoc isn't mutated the test will pass.

notslang commented 9 years ago

As for changing behaviour based on the _rev - is the format of _rev actually standardized / guaranteed?

nolanlawson commented 9 years ago

@slang800 Yep, all CouchDB implementations have the format <integer>-<hash>. So it's guaranteed that 1-x refers to the 1st generation, 2-y to the second generation, through 10-z and 100-w etc. We actually use it a lot in the PouchDB code to optimize for special cases with first-gen docs (e.g. during replication and map/reduce, where we know there's zero history, and so we can take shortcuts).

The only part that's quasi-standard is what comes after the hyphen. PouchDB just generates a random number, whereas CouchDB calculates a checksum of the entire document in order to dedup duplicate changes across multiple nodes. There's very little harm in us implementing it the way we do, and in fact it would be impossible, since the checksum implementation requires Erlang strings. But that's a whole other discussion; there's an open issue somewhere in CouchDB to make the revs truly deterministic. :)