Closed GoogleCodeExporter closed 9 years ago
This is legit, and while I know a lot of work went into ImmutableSet, I do
think this demonstrates that it's still too easy to trigger the worst case of
linear probing.
I'm currently experimenting with the following smearing function:
hashCode ^= Integer.rotateRight(hashCode, 20) ^ Integer.rotateRight(hashCode, 12);
hashCode ^= Integer.rotateRight(hashCode, 7) ^ Integer.rotateRight(hashCode, 4);
hashCode *= 0x5BD1E995;
return hashCode;
(The multiplier constant is borrowed from murmur hash, which I suspected would
probably yield good results.)
I'm experimenting with quadratic probing at the moment with this smear
function, just because I want to avoid additional branches, and quadratic
probing retains pretty good locality.
Original comment by lowas...@google.com
on 27 Sep 2012 at 10:58
What's the latest rehash function being used in ConcurrentHashMapV8.java and so
forth? Doug changes it often and we might improve just by trying whatever he's
doing nowadays.
I also think double hashing sounds very worth trying -- all those bits sitting
there in the upper reaches of the "smeared" hash code, unused. (Maybe double
hashing isn't the right term here, not sure, perhaps this is cuckoo hashing
with N=2.)
Original comment by kevinb@google.com
on 27 Sep 2012 at 11:30
From
http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/jsr166e/ConcurrentHashMa
pV8.java?revision=1.62&view=markup :
/**
* Spreads higher bits to lower, and also forces top 2 bits to 0.
* Because the table uses power-of-two masking, sets of hashes
* that vary only in bits above the current mask will always
* collide. (Among known examples are sets of Float keys holding
* consecutive whole numbers in small tables.) To counter this,
* we apply a transform that spreads the impact of higher bits
* downward. There is a tradeoff between speed, utility, and
* quality of bit-spreading. Because many common sets of hashes
* are already reasonably distributed across bits (so don't benefit
* from spreading), and because we use trees to handle large sets
* of collisions in bins, we don't need excessively high quality.
*/
private static final int spread(int h) {
h ^= (h >>> 18) ^ (h >>> 12);
return (h ^ (h >>> 10)) & HASH_BITS;
}
But honestly, part of the issue is that linear or quadratic open addressing is
vulnerable not just to exact hash collisions, but to hashes that differ only in
relatively low bits. The smearing methods don't do anything at all to help
there, and that's the issue that needs addressing.
(I'm pretty strongly inclined to keep linear or quadratic probing for the first
couple of probes at least, just to take advantage of memory locality.)
The experimental smearing method I proposed above is doing pretty well in
benchmarks at the moment, which is nice -- and it's doing well with both linear
and quadratic probing. Will keep y'all updated.
Original comment by lowas...@google.com
on 27 Sep 2012 at 11:42
@Louis: Note that unlike the right shifts, the rotations gives you no guarantee
that "hashCodes that differ only by constant multiples at each bit position
have a bounded number of collisions". Or maybe they do, but I can't see how.
> and quadratic probing retains pretty good locality.
I don't think so. It has good locality in the first few steps (which is good as
you quite often stop there), but on the average it's quite bad. Assuming the
probe sequence
0, 1, 3, 6, 10, 15
the distances are
1, 2, 3, 4, ...
which rarely hits the same cache line on the next step.
The classical double hashing has distances
N, N, N, ...
(where N is derived from the hash code and must be odd for power of two table
sizes)
which for most values of N never hits the same cache line on the next step.
What I'd propose is using distances
1, 1, 1, ..., 1, N, 1, 1, 1, ..., 1, N, ....
which most of the time hits the same cache line on the next step. Moreover, in
the beginning it's as efficient as the linear reprobing.
In order for the probe sequence to cover the whole table it's sufficient that
- the block length K is a power of two
- (K-1) + N is an odd multiple of K
Explanation: A block consists of K-1 ones and a single N, so the distance
travelled after one block is (K-1)*1 + 1*N. Moving by an odd multiple of the
block length guarantees that all blocks get visited (as an odd number is
co-prime to the number of blocks which is a power of two).
@Kevin:
> I also think double hashing sounds very worth trying -- all those bits
sitting there in the upper reaches of the "smeared" hash code, unused.
> (Maybe double hashing isn't the right term here, not sure, perhaps this is
cuckoo hashing with N=2.)
Agreed, while most hashing algorithms use only `log2(table.length)` bits, both
double and cuckoo hashing can use nearly twice as much.
This means that for tables bigger than 2**16, all bits gets used -- somehow.
I think you've meant double hashing, as cuckoo works quite differently.
Actually, I've just got a different idea, very simple and maybe optimal for the
issue I described. I'll try it out and report later.
Original comment by Maaarti...@gmail.com
on 29 Sep 2012 at 7:24
Here the idea I've already announced: During `construct` the maximum number of
probes needed and store it in field `maxProbes`. As there are no collisions in
the case I reported, `maxProbes` equals to 1 and thus
`RegularImmutableSet.contains` can return false after having inspected a single
slot, so it works perfectly.
However, there's a similar problem including a lot of collision, and then it
can't help at all (see the attachment). The benchmark is here:
https://dl.dropbox.com/u/4971686/published/maaartin/guava/ImmutableSetBenchmark1
.java
Note that with `collide==true` there are exactly 2 items per slot, but
`maxProbes` get big due to linear reprobing. You may want to make `absent`
shorter in case it aborts due to the caliper time limit.
I think I've got a nice solution for all these problems (soon to come).
Original comment by Maaarti...@gmail.com
on 2 Oct 2012 at 1:41
Attachments:
> Here the idea I've already announced: During `construct` the maximum number
of probes needed and store it in field `maxProbes`. As there are no collisions
in the case I reported, `maxProbes` equals to 1 and thus
`RegularImmutableSet.contains` can return false after having inspected a single
slot, so it works perfectly.
I'm not seeing how this would be a doable thing without significant increased
complication in the construction method -- we don't even remember hash codes at
all, at the moment.
> Note that unlike the right shifts, the rotations gives you no guarantee that
"hashCodes that differ only by constant multiples at each bit position have a
bounded number of collisions". Or maybe they do, but I can't see how.
Hrrrrm. I know you're quoting the HashMap source, but I don't really
understand what "differ only by constant multiples at each bit position" could
possibly mean. Do the bits differ by only a constant multiple? Of what?
I'm going to poke around Google for advice, maybe.
Original comment by lowas...@google.com
on 2 Oct 2012 at 11:35
I observe that the JDK's IdentityHashMap uses open addressing, and its
"smearing" function looks like
int h = System.identityHashCode(x);
// Multiply by -127, and left-shift to use least bit as part of hash
return ((h << 1) - (h << 8)) & (length - 1);
Worth exploring, perhaps?
Original comment by lowas...@google.com
on 2 Oct 2012 at 11:49
> I'm not seeing how this would be a doable thing without significant increased
complication in the construction method...
It's trivial and very cheap, just counting the iterations in the inner loop,
see the patch.
But in general, there's a tradeoff with unknown parameters: Storing the hashes
would speed up reprobing and so can spending more time in `smear` and/or in the
construction. Is there any chance to find out e.g. how many `contains` calls
per `ImmutableSet` are there on the average in your real world programs?
> > > Note that unlike the right shifts, the rotations...
> Hrrrrm. I know you're quoting the HashMap source, but I don't really
understand what "differ only by constant multiples at each bit position" could
possibly mean. Do the bits differ by only a constant multiple? Of what?
I can only guess what they mean, too. But whatever property it is, it can't be
destroyed by a final multiplication, as there's no additional flow from the
higher bits to the lower ones. An initial multiplication (i.e. before the right
shift) is a different case, as them together create such a flow. It's hard to
explain what I mean, as it's based of my fuzzy understanding of their stated
goal.
> I'm going to poke around Google for advice, maybe.
You mean personally? I couldn't google out anything.
> I observe that the JDK's IdentityHashMap... Worth exploring, perhaps?
I don't think so. The multiplication puts nearby hashes far apart, which is a
good thing because of the linear reprobing, but that's about all. The hash
distrubution is different and the cost of reprobing much lower.
Original comment by Maaarti...@gmail.com
on 3 Oct 2012 at 1:08
Attachments:
> I'm going to poke around Google for advice, maybe.
Perhaps a more accurate statement would be
> I'm going to poke around Googlers for advice, maybe.
In any event, I'm getting the impression that we agree that an initial
multiplication strictly helps -- it's still possible to cause the worst-case,
of course, but it requires being more deliberate about it. Should I go ahead
and try to push ahead on that change while we continue the discussion on the
other changes?
Original comment by wasserman.louis
on 3 Oct 2012 at 8:49
> In any event, I'm getting the impression that we agree that an initial
multiplication strictly helps...
Yes. Currently, I know of no better way how to find the cases when the
multiplication is detrimental than bruteforcing it.
It's very cheap on modern CPUs:
StandardSmearer 1.71
MultiplyingSmearer 1.97
As it is very cheap and improves the distribution, it might speed the search up
even in commonly occuring cases, too.
> Should I go ahead and try to push ahead on that change while we continue the
discussion on the other changes?
It depends on the probability that someone gets hit by this problem in the
meantime. That's something I can't tell.
Original comment by Maaarti...@gmail.com
on 4 Oct 2012 at 9:57
This small program illustrates how I understand what "differ only by constant
multiples at each bit position" means:
https://dl.dropbox.com/u/4971686/published/maaartin/guava/SmearerDemo.java
According to it:
- final multiplication is good
- multiplication before the shifts is bad
- replacing the second two shifts by rotations is good
- using rotations only is bad
- the simplified shifting from ConcurrentHashMapV8 is bad
But the program only counts collisions assuming closed hashing. Secondary
collisions due to rehashing are a very different beast.
Original comment by Maaarti...@gmail.com
on 11 Oct 2012 at 9:55
I'm putting this down as fixed by
https://code.google.com/p/guava-libraries/source/detail?r=69ad96b719d7cd3d872a94
8d7454f17b816a21c2 .
Even if the new smearing function can be defeated, it's at least much more
difficult to make it blow up accidentally. (Incidentally, it also beats the
old Hashing.smear in benchmarks.)
Original comment by lowas...@google.com
on 30 Oct 2012 at 7:20
Original comment by lowas...@google.com
on 30 Oct 2012 at 7:21
This issue has been migrated to GitHub.
It can be found at https://github.com/google/guava/issues/<id>
Original comment by cgdecker@google.com
on 1 Nov 2014 at 4:13
Original comment by cgdecker@google.com
on 3 Nov 2014 at 9:08
Original issue reported on code.google.com by
Maaarti...@gmail.com
on 27 Sep 2012 at 10:48