mirek / node-json-hash

JSON hash
24 stars 4 forks source link

Added improvements to array hashing #4

Closed ramesees closed 8 years ago

ramesees commented 8 years ago

Hi

I was using your library and discovered that array hashing was failing for some use cases that I felt should have produced the same hash. So I've made some changes that I would like you to review.

The changes I've added now sort each array according to the hash of each element and then proceeds as it did before.

Test cases have been added and I've bumped the version to 1.2.0 to reflect the functionality change.

mirek commented 8 years ago

Thanks for input.

If you want to compute hash for sets that you encode as arrays (it looks like that's what you're trying to do) you can:

  1. map an array to object with keys as stringified indices and values unchanged - and compute hash on that, or
  2. we can talk about how to add support for it in the library directly

To add support in the library we can't treat all arrays as sets, in many (most?) cases people want to distinguish between [1, 2] and [2, 1] - they are different arrays. But json doesn't have set data type unfortunately so sometimes you want to use set and back it up by array.

For that case we can:

  1. maybe introduce options.arraysAreSets = true | false option. The downside is that with this approach all arrays in hashed document will be treated as sets. Sometimes it's ok, sometimes it's not.
  2. maybe introduce options.hasher = function which, if supplied, can override default behaviour (the function would need to get in params value and key path) - when returns undefined, it'll just do default hashing, but if result is not undefined it'll use that as hash. this way you could return hash for sorted arrays instead of arrays... but this needs to be designed correctly. Ideally you'd be able to easily attach hasher just by extended type ("extended" meaning it would support 'array' - because typeof [] === object so normal typeof is too limited for that), attaching hasher per key path (with wildcards so you can do it for nested elements in arrays).
  3. another option, maybe the best one, is to create new symbol in the library toHash and use it as interface - if defined on object myObject[jsonHash.toHash] = function () { return whatever; } then it's going to use that as a hash. In there you can sort and return hash for that.
ramesees commented 8 years ago

Thanks for the feedback.

The use cases that I require the hashing functionality for are dynamic json structures which are highly nested which contain objects, arrays and primitives. They can range from several hundred kilobytes to several megabytes in size.

Options 2 or 3 wouldn't be practical with the nature of these structures as the idea would be to use the hashing library as a fire and forget with minimal configuration to obtain a hash value which can be used to compare these structures at a later date.

Option 1 (of adding it to the library) seems to be best solution for me for right now, as the ordering of the values in the arrays has no bearing on its meaning.

mirek commented 8 years ago

Released version 1.2.0, please let know if it works for you. The option is { sets: true }.