Closed GoogleCodeExporter closed 9 years ago
Thanks for turning me on to "murmur"! This has generated no small amount of
interest
at work--it could be a boon to memcached.
But please note: Python's string hashing function has been tuned to Python's
workload, particularly wrt dicts. For example, strings where you "increment"
the
last character ("abc" to "abd", or "100" to "101") hash to the same value + 1.
This
would normally be considered a terrible hash algorithm as its avalanching is so
rotten. But it works well for Python dicts because it gives '"better than
random"
dict collision behavior for input strings very close together'. I cite Tim
Peters:
http://www.mail-archive.com/python-3000@python.org/msg10194.html
See also the big comment near the top of "dictobject.c".
Not that you shouldn't experiment! Just that you should tread lightly and
benchmark
the hell out of it.
Cheers,
/larry/
Original comment by exoticma...@gmail.com
on 20 Sep 2009 at 11:37
Right, we believe the current hash/hashtable is good, but it's always worth
experimenting, especially as new hash techniques are developed. I've seen the
""namea", "nameb", "namec", and "named" hash very closely, which means X, Y and
Z"
example in several places, but I'd be curious how many performance-critical
dicts have
such keys with those characteristics.
Original comment by collinw
on 21 Sep 2009 at 6:57
Original comment by collinw
on 6 Jan 2010 at 11:43
Original comment by collinw
on 19 Feb 2010 at 9:43
Original issue reported on code.google.com by
collinw
on 18 Sep 2009 at 7:59