mscdex / node-xxhash

An xxhash binding for node.js
Other
193 stars 28 forks source link

hash64 seed fixes + byte order consistency changes #14

Closed laverdet closed 7 years ago

laverdet commented 8 years ago

Attached are 2 changes, one is pretty straight forward, the other is a breaking change but one I think needs to happen.

First is 6a183008e941a58e6810c4e5c52a59754d37dbb7: convert_seed in the 64-bit version currently uses uint32_t. This was a no brainer.

Additionally included are changes to thrown exceptions. The seed_val.Clear() trick and later check to info[0].IsEmpty() never worked. Since you're passing a copy of the v8::Local<> you only clear the handle in the local scope. In the case that convert_seed throws, you still end up calculating a hash and then it just so happens that the correct exception is thrown anyway. I fixed that to correctly detect the exception and return early.

The second change is 21b13985ab4856e8a59dda99cb4185c27064ed29: Right now when you pass a buffer in as a seed value you are converting it to an uint32_t or uint64_t using big-endian format (the loop where you do seed <<= 8). But then when you return the hash you cast it to a char* and return that as-is. On x86 that will be little-endian, on other platforms it may be big-endian.

I updated the hash functions to always return the hash in big-endian format (which currently matches what convert_seed expects). Since it is not uncommon to calculate a hash, and then to use that hash as a seed for another hash the endianness should at least be internally consistent.

So with these changes we should expect the results of node-xxhash to be the same on all platforms, and using the result of hash as a seed for another hash to work as expected.

pxwise commented 7 years ago

@mscdex these look like some solid improvements. Any reservations on merging this so npmland can take advantage?

mscdex commented 7 years ago

@pxwise I missed this. However, after briefly looking over the changes, I am a bit concerned about the change in output. @laverdet Can we keep the old format or is the change inevitable?

Also I'm not sure why ws2_32.lib is needed, but if we absolutely need it, it should go in binding.gyp instead of in a pragma.

laverdet commented 7 years ago

The change to output is necessary. As mentioned in the original comment, the seed input is always big-endian but your output is platform-specific (little-endian on most systems).

Consider the following:

let payload = new Buffer('payload');

let result32 = xxhash.hash(payload, 1);
let resultBuffer = new Buffer(4);
xxhash.hash(payload, 1, resultBuffer);

console.log(xxhash.hash(payload, result32) == xxhash.hash(payload, resultBuffer));

This demonstrates the endian inconsistencies. This logs "false" currently, but after my fix it returns "true".

Winsock wasn't necessary I was just mad that htonl() was making the build fail and patched it in a stupid way. I rewrote that stuff to remove the silly endian checking and reinterpret_cast's entirely. Pull request should be updating now.

mscdex commented 7 years ago

I went about this in a slightly different way in b9980c2 (by taking advantage of some xxhash APIs after an upstream xxhash upgrade). I also opted to use little endian as the common endianness since little endian is much more common and thus would affect less users. I tested these changes on a big endian system via qemu and all tests pass as-is now.

Thanks again for the PR and responses. Sorry it took so long.