use quadratic probing for the stringtable

higgsjs / Higgs

Higgs JavaScript Virtual Machine

877 stars 62 forks source link

use quadratic probing for the stringtable #170

Closed MartinNowak closed 9 years ago

MartinNowak commented 9 years ago

avoids the primary clustering problem of linear probing
using triangular numbers with a pow-2 sized table guarantees that each slot is visited exactly once

MartinNowak commented 9 years ago

How do you plot the benchmark results? The csv alone isn't too readable.

Primary clustering with linear probing can be a real performance problem especially with large tables like this.

This probing is also used in SparseHash and I recently deployed it in dmd's stringtable. http://google-sparsehash.googlecode.com/svn/trunk/doc/implementation.html https://github.com/D-Programming-Language/dmd/pull/4088

maximecb commented 9 years ago

For performance, you should use the "nocomptime" benchmarks. From the source directory, run: ./benchmark.py --vm_cmd="./higgs" --bench_list="benchmarks/nocomptime/benchmark-list.csv"

It will output timing results directly. If you want to use the CSV file, you should look at "exec time (ms)".

Wrt the pull request, is this assert correct: step < newSize,

If you can show measurable speedups, I'll accept the PR.

MartinNowak commented 9 years ago

The benchmark results are a bit inconclusive, because your load factor is fairly low 0.6. I get some speedups for splay, but then I checked the number of collision and for splay I get an average of ~70 slots that are visited before finding the entry. With quadratic probing it's more like 50, but both values don't make sense. There are a few other benchmarks with a suspicious number of collisions.

MartinNowak commented 9 years ago

see #171

maximecb commented 9 years ago

How does the change in #171 affect things?

MartinNowak commented 9 years ago

There is no measurable speedup with a load factor of 0.6, so I'll close it.