mscdex / node-xxhash

An xxhash binding for node.js
Other
193 stars 28 forks source link

64bit hashing a string #13

Closed Bramzor closed 8 years ago

Bramzor commented 8 years ago

I'm trying to use your module for hashing a string. Tried this but for some reason, I'm not able to do this correctly because it requires a buffer instead of a string? XHash.hash64(new Buffer(data.toLowerCase()), seed).toString(); But this returns something like: \xef\xbf\xbd\xef\xbf\xbd....

mscdex commented 8 years ago

When you decode binary data (what hash64() returns by default) to a UTF-8 string (via .toString()), you run the risk encountering byte sequences that are not valid in UTF-8. That is obviously what is happening in this case as you can see by the series of 0xef, 0xbf, 0xbd bytes. Those bytes represent the UTF-8 replacement character (\uFFFD) and replace invalid UTF-8 characters found during the decoding process.

Instead, you can either pass a different output encoding as the third parameter, or pass an encoding to .toString(). The encoding you choose should be one that keeps binary data intact (e.g. 'hex' or 'base64').

Bramzor commented 8 years ago

What I actually want to do is just hashing a string and not binary data. But I'm forced to provide a buffer because it doesn't support a string.

If I remove the toString() and create a JSON of the output after hashing, I get: {\"type\":\"Buffer\",\"data\":[227,107,211,121,232,51,217,85]} What I'm looking for is something like: 2844c4aa8ad49a19

mscdex commented 8 years ago

As I said, you can specify a better (output) encoding two different ways:

XHash.hash64(new Buffer(data.toLowerCase()), seed).toString('hex');

or:

XHash.hash64(new Buffer(data.toLowerCase()), seed, 'hex');

The former returns a Buffer containing the hash and then calls buffer.toString() to convert the binary hash contents to something printable.

The latter converts the contents internally and directly returns a string of the passed encoding.

Bramzor commented 8 years ago

That did it. Wouldn't have been able to figure this out myself for some reason. Thanks a lot! Maybe an idea to add this to the readme :)

mscdex commented 8 years ago

It is already in the readme :-)

Bramzor commented 8 years ago

Actually the issue was not fixed at all. I'm using https://asecuritysite.com/encryption/xxHash to validate the result but because I'm providing the information to a buffer, the hash result is something completely different. So I assume there is no proper way to hash a string with this implementation?

mscdex commented 8 years ago

The values I tried actually match those coming from that site, despite being a different version of xxHash. The difference is that website is explicitly converting at least the 64-bit values to big endian format. If you reverse the hex result you will see it matches what this module returns. I suppose I could add a third parameter that converts to big endian if the host CPU is little endian....