Closed hughfdjackson closed 11 years ago
owing to issue discussed in commit e77ef622fbaa4faaaa3bf1a5291e5527996f937d, these need to be re-run.
To Test:
// hash function for strings, based on Java's String.hashCode:
// http://docs.oracle.com/javase/1.4.2/docs/api/java/lang/String.html#hashCode()
var hash = function(str){
var h = 0
var l = str.length
for ( var i = 0; i < l; i += 1 )
h = h * 31 + str.charCodeAt(i)
return h
}
// http://docs.oracle.com/javase/1.4.2/docs/api/java/lang/String.html#hashCode()
var hash = function(str){
var h = 0
var l = str.length
for ( var i = 0; i < l; i += 1 )
h += str.charCodeAt(i) * 31 * l - i
return h
}
var max32bit = 0x7fffffff
var hash = require('chash')
// to allow hooks for other implementations/tests to override the default
// hash and equality functions (which are the necessary ones for creating
// hash-table-like behaviour, as the hash-trie has), they can be passed in
// as opts to the CRUD functions. The default ones covers the 80% use-case
var defaultOpts = {
eq : function(a, b){ return a === b },
hash: function(str){
return hash(str, max32bit)
}
}
// hash function
var hash = function(s) {
var b = 27183
var a = 31415
var h = 0
var tableSize = 0x7fffff
for (var i = 0; i < s.length; i++) {
h = (a * h + s[i].charCodeAt()) % tableSize;
a = ((a % tableSize) * (b % tableSize)) % (tableSize);
}
return h;
}
// hash function
// based on J. Zobel (2001)
// http://www.seg.rmit.edu.au/code/zwh-ipl/bitwisehash.c
var max32bit = 2 ^ 32
var hash = function(str){
var hash = 0
for ( var i = 0; i < str.length; i += 1 )
hash ^= ( hash << 5) + str[i] + (hash >> 2)
return hash % 32
}
Running hashes with assoc:
hash functions with 1 members
//-------------------------//
Results:
djb2 : 885,816.60 ± 1.56% ops/sec
java : 922,699.86 ± 1.14% ops/sec
bad java : 876,877.02 ± 5.71% ops/sec
chash : 74,287.58 ± 1.15% ops/sec
sedgewick : 385,917.29 ± 1.92% ops/sec
zobel : 47,888.81 ± 0.91% ops/sec
hash functions with 10 members
//-------------------------//
Results:
djb2 : 410,918.13 ± 1.40% ops/sec
java : 192,507.29 ± 1.27% ops/sec
bad java : 384,634.99 ± 1.17% ops/sec
chash : 67,468.85 ± 2.44% ops/sec
sedgewick : 81,896.45 ± 1.43% ops/sec
zobel : 66,858.36 ± 2.02% ops/sec
hash functions with 100 members
//-------------------------//
Results:
djb2 : 122,969.13 ± 1.22% ops/sec
java : 101,937.31 ± 3.32% ops/sec
bad java : 91,606.27 ± 3.02% ops/sec
chash : 14,283.30 ± 1.63% ops/sec
sedgewick : 72,671.63 ± 2.32% ops/sec
zobel : 28,799.55 ± 1.14% ops/sec
hash functions with 1000 members
//-------------------------//
Results:
djb2 : 87,241.73 ± 1.89% ops/sec
java : 61,965.70 ± 1.47% ops/sec
bad java : 71,853.68 ± 2.29% ops/sec
chash : 9,993.05 ± 3.87% ops/sec
sedgewick : 46,216.02 ± 2.94% ops/sec
zobel : 3,974.42 ± 2.50% ops/sec
hash functions with 10000 members
//-------------------------//
Results:
djb2 : 57,905.53 ± 1.16% ops/sec
java : 47,137.80 ± 6.75% ops/sec
bad java : 46,386.83 ± 1.36% ops/sec
chash : 10,158.69 ± 1.76% ops/sec
sedgewick : 38,191.96 ± 1.67% ops/sec
zobel : 464.05 ± 1.76% ops/sec
Second run (nothing changed):
hash functions with 1 members
//-------------------------//
Results:
djb2 : 959,078.63 ± 2.46% ops/sec
java : 796,101.17 ± 1.89% ops/sec
bad java : 815,730.86 ± 1.83% ops/sec
chash : 68,013.71 ± 2.18% ops/sec
sedgewick : 346,898.96 ± 2.59% ops/sec
zobel : 46,871.28 ± 2.45% ops/sec
hash functions with 10 members
//-------------------------//
Results:
djb2 : 353,532.15 ± 2.61% ops/sec
java : 352,739.57 ± 2.22% ops/sec
bad java : 393,115.55 ± 2.59% ops/sec
chash : 62,976.55 ± 2.12% ops/sec
sedgewick : 124,085.42 ± 2.77% ops/sec
zobel : 70,573.84 ± 1.77% ops/sec
hash functions with 100 members
//-------------------------//
Results:
djb2 : 67,071.95 ± 2.04% ops/sec
java : 78,334.71 ± 2.58% ops/sec
bad java : 66,977.90 ± 1.31% ops/sec
chash : 12,828.44 ± 1.36% ops/sec
sedgewick : 137,500.70 ± 1.10% ops/sec
zobel : 29,760.31 ± 0.91% ops/sec
hash functions with 1000 members
//-------------------------//
Results:
djb2 : 76,624.03 ± 1.53% ops/sec
java : 91,677.98 ± 1.67% ops/sec
bad java : 73,777.04 ± 2.31% ops/sec
chash : 9,716.49 ± 1.75% ops/sec
sedgewick : 44,027.53 ± 2.00% ops/sec
zobel : 3,777.90 ± 3.81% ops/sec
hash functions with 10000 members
//-------------------------//
Results:
djb2 : 45,890.18 ± 4.08% ops/sec
java : 57,726.71 ± 2.44% ops/sec
bad java : 53,616.03 ± 1.76% ops/sec
chash : 9,755.20 ± 2.20% ops/sec
sedgewick : 39,602.86 ± 1.41% ops/sec
zobel : 490.77 ± 1.67% ops/sec
This is clearly a two-horse race; time to put them against each other in a prolonged heat
Comparing js implementations of djb2 (as found in the 'string-hash' library) and an implementation of Java's String#hashCode. 'members' refers to the number of items in the trie before another is associated with it (using the hashing function indicated for all operations).
hash functions with 1 members
//-------------------------//
Results:
djb2 : 1,046,354.60 ± 0.75% ops/sec
java : 715,406.34 ± 5.67% ops/sec
hash functions with 3 members
//-------------------------//
Results:
djb2 : 597,648.06 ± 1.40% ops/sec
java : 513,078.12 ± 4.12% ops/sec
hash functions with 5 members
//-------------------------//
Results:
djb2 : 201,509.19 ± 4.81% ops/sec
java : 288,255.80 ± 3.29% ops/sec
hash functions with 10 members
//-------------------------//
Results:
djb2 : 375,951.44 ± 1.93% ops/sec
java : 378,051.41 ± 1.56% ops/sec
hash functions with 50 members
//-------------------------//
Results:
djb2 : 138,231.14 ± 1.37% ops/sec
java : 138,343.91 ± 1.52% ops/sec
hash functions with 100 members
//-------------------------//
Results:
djb2 : 110,511.56 ± 1.71% ops/sec
java : 105,039.06 ± 2.66% ops/sec
hash functions with 500 members
//-------------------------//
Results:
djb2 : 84,991.49 ± 1.16% ops/sec
java : 83,363.07 ± 2.17% ops/sec
hash functions with 1000 members
//-------------------------//
Results:
djb2 : 92,564.42 ± 1.80% ops/sec
java : 94,715.46 ± 1.24% ops/sec
hash functions with 5000 members
//-------------------------//
Results:
djb2 : 65,751.20 ± 1.27% ops/sec
java : 65,629.24 ± 1.35% ops/sec
hash functions with 10000 members
//-------------------------//
Results:
djb2 : 61,354.09 ± 1.66% ops/sec
java : 63,677.28 ± 1.01% ops/sec
hash functions with 50000 members
//-------------------------//
Results:
djb2 : 47,313.94 ± 1.50% ops/sec
java : 46,067.14 ± 2.69% ops/sec
hash functions with 100000 members
//-------------------------//
Results:
djb2 : 38,258.11 ± 1.24% ops/sec
java : 38,007.39 ± 1.85% ops/sec
If there's anything at all in it, djb2 seems marginally faster. Sticking with that
hash functions with 1 members
//-------------------------//
djb2 : 993,088.88 ± 1.20% ops/sec
java : 989,627.42 ± 1.66% ops/sec
hash functions with 3 members
//-------------------------//
djb2 : 591,505.49 ± 2.84% ops/sec
java : 646,243.85 ± 1.43% ops/sec
hash functions with 5 members
//-------------------------//
djb2 : 635,526.65 ± 1.04% ops/sec
java : 632,385.19 ± 2.91% ops/sec
hash functions with 10 members
//-------------------------//
djb2 : 286,055.96 ± 0.94% ops/sec
java : 284,507.54 ± 1.28% ops/sec
hash functions with 50 members
//-------------------------//
djb2 : 111,894.94 ± 6.06% ops/sec
java : 124,489.66 ± 1.11% ops/sec
hash functions with 100 members
//-------------------------//
djb2 : 128,593.56 ± 0.85% ops/sec
java : 127,947.98 ± 1.15% ops/sec
hash functions with 500 members
//-------------------------//
djb2 : 95,663.66 ± 1.66% ops/sec
java : 93,704.14 ± 2.05% ops/sec
hash functions with 1000 members
//-------------------------//
djb2 : 72,985.30 ± 2.51% ops/sec
java : 74,397.36 ± 1.58% ops/sec
hash functions with 5000 members
//-------------------------//
djb2 : 48,049.11 ± 1.19% ops/sec
java : 47,636.50 ± 1.24% ops/sec
hash functions with 10000 members
//-------------------------//
djb2 : 63,720.82 ± 1.29% ops/sec
java : 58,721.75 ± 5.39% ops/sec
hash functions with 50000 members
//-------------------------//
djb2 : 47,853.15 ± 1.26% ops/sec
java : 47,992.99 ± 1.34% ops/sec
hash functions with 100000 members
//-------------------------//
djb2 : 36,559.10 ± 2.18% ops/sec
java : 37,119.72 ± 1.58% ops/sec
Just to be sure, the two should be benchmarked against using /[0-9]*/ keys; since this was a use case that appeared to make djb2 fall over before.
Using int keys. Both neck and neck; sticking with djb2
hash functions with 1 members
//-------------------------//
djb2 : 995,983.41 ± 2.06% ops/sec
java : 994,876.93 ± 1.41% ops/sec
hash functions with 3 members
//-------------------------//
djb2 : 634,895.20 ± 1.25% ops/sec
java : 624,670.64 ± 1.55% ops/sec
hash functions with 5 members
//-------------------------//
djb2 : 596,814.03 ± 1.17% ops/sec
java : 586,561.99 ± 1.92% ops/sec
hash functions with 10 members
//-------------------------//
djb2 : 251,438.20 ± 1.98% ops/sec
java : 262,647.29 ± 1.27% ops/sec
hash functions with 50 members
//-------------------------//
djb2 : 123,931.09 ± 2.22% ops/sec
java : 126,741.95 ± 1.28% ops/sec
hash functions with 100 members
//-------------------------//
djb2 : 69,541.20 ± 1.58% ops/sec
java : 70,173.79 ± 1.06% ops/sec
hash functions with 500 members
//-------------------------//
djb2 : 65,327.74 ± 1.21% ops/sec
java : 65,583.86 ± 0.93% ops/sec
hash functions with 1000 members
//-------------------------//
djb2 : 70,202.74 ± 2.82% ops/sec
java : 70,752.70 ± 2.19% ops/sec
hash functions with 5000 members
//-------------------------//
djb2 : 54,216.69 ± 4.23% ops/sec
java : 58,019.67 ± 3.08% ops/sec
hash functions with 10000 members
//-------------------------//
djb2 : 44,893.01 ± 1.90% ops/sec
java : 46,387.61 ± 1.16% ops/sec
hash functions with 50000 members
//-------------------------//
djb2 : 38,634.85 ± 1.45% ops/sec
java : 38,032.10 ± 1.89% ops/sec
hash functions with 100000 members
//-------------------------//
djb2 : 30,307.27 ± 2.52% ops/sec
java : 30,306.57 ± 2.50% ops/sec
Test the relative perf of different hashing functions, to pick one for use as the default hash. Should produce a hash usable with hashMask (i.e. one that produces at least a 32 bit hash)