Map optimisations - Githubissues

billyquith commented 4 years ago

I experimented with some numbers and changing the GROW_FACTOR to 3 gives a 20% speed up. Other numbers seem to be less optimal. I assume higher numbers become a lot more inefficient when the table is very large due to initialising and copying to huge table. MAP_LOAD_PERCENT seems to be good at 75%.

https://github.com/billyquith/wren/commit/5b75dc301a39228d48373ef4e760e088325e743c

base
GROW_FACTOR 2 MAP_LOAD_PERCENT 75
map_numeric - wren             .......... 2.07s 0.0701 100.73% relative to baseline

GROW_FACTOR 3 MAP_LOAD_PERCENT 75
map_numeric - wren             .......... 1.71s 0.0245 122.05% relative to baseline
map_numeric - wren             .......... 1.72s 0.0452 121.06% relative to baseline

GROW_FACTOR 4 MAP_LOAD_PERCENT 75
map_numeric - wren             .......... 1.76s 0.0962 118.44% relative to baseline

GROW_FACTOR 5 MAP_LOAD_PERCENT 75
map_numeric - wren             .......... 1.88s 0.1039 110.87% relative to baseline

GROW_FACTOR 3 MAP_LOAD_PERCENT 70
map_numeric - wren             .......... 2.20s 0.0563  94.55% relative to baseline

GROW_FACTOR 3 MAP_LOAD_PERCENT 80
map_numeric - wren             .......... 1.74s 0.0461 119.87% relative to baseline

I ran the full benchmarks twice.

api_call - wren                .......... 0.14s 0.0025  98.54% relative to baseline
api_foreign_method - wren      .......... 0.93s 0.0296  97.95% relative to baseline
binary_trees - wren            .......... 0.55s 0.0072 100.55% relative to baseline
binary_trees_gc - wren         .......... 2.55s 0.0361  97.17% relative to baseline
delta_blue - wren              .......... 0.45s 0.0044  93.78% relative to baseline
fib - wren                     .......... 0.65s 0.0296  92.88% relative to baseline
fibers - wren                  .......... 0.13s 0.0017  99.24% relative to baseline
for - wren                     .......... 0.23s 0.0110 100.86% relative to baseline
method_call - wren             .......... 0.33s 0.0044  97.85% relative to baseline
map_numeric - wren             .......... 1.73s 0.0208 120.50% relative to baseline
map_string - wren              .......... 0.19s 0.0017 107.25% relative to baseline
string_equals - wren           .......... 0.40s 0.0127  98.50% relative to baseline

api_call - wren                .......... 0.14s 0.0021 100.00% relative to baseline
api_foreign_method - wren      .......... 0.93s 0.0281  97.85% relative to baseline
binary_trees - wren            .......... 0.55s 0.0038 100.18% relative to baseline
binary_trees_gc - wren         .......... 2.56s 0.0779  96.72% relative to baseline
delta_blue - wren              .......... 0.45s 0.0291  94.20% relative to baseline
fib - wren                     .......... 0.64s 0.0179  93.90% relative to baseline
fibers - wren                  .......... 0.13s 0.0055 100.00% relative to baseline
for - wren                     .......... 0.23s 0.0027 100.43% relative to baseline
method_call - wren             .......... 0.32s 0.0029  99.07% relative to baseline
map_numeric - wren             .......... 1.72s 0.0382 120.92% relative to baseline
map_string - wren              .......... 0.19s 0.0027 107.25% relative to baseline
string_equals - wren           .......... 0.40s 0.0197  99.25% relative to baseline

I wondered if anyone else had experimented with this.

mhermier commented 4 years ago

Personal opinion, grow facteur should be configurable.

billyquith commented 4 years ago

https://github.com/wren-lang/wren/blob/44d6d205866424c07ab48a499a8cf42fe1bcda92/src/vm/wren_value.c#L14

But the default values could be better?

mhermier commented 4 years ago

There is no such thing as better default values. It only depends on usage/mesure/allocation strategies... The value you want to change only shows that the default allocation grows doesn't feat the usage. In the real world you will want to reserve enough to avoid growth.

CohenArthur commented 4 years ago

@billyquith Default values can be better, but you have to take performance as well as memory consumption into account. Your value is better for a simple reason: We have to allocate WAY less, especially on big programs. However, the memory cost is way higher. You could even set it to 20 and you'll only have to reallocate once for most programs, making it very "efficient". But the memory impact would be hard, especially for an embedded language where there might already be constraints .

Tuning these, here, probably mean benchmarking different values against A LOT of code. I've seen a couple projects where the best growth factor was 1.75, and they had taken into account the realloc algorithm, average map load and optimal memory consumption

billyquith commented 4 years ago

The cost of resize (larger and smaller) and the associated copies and insertions of the values is significant. If noone has investigated what the best performance/memory trade off is then perhaps the first guess of the best values is not the best. There is a comment in the code to indicate this.

3 might be high, but the cost of resizing appears to be quite high in Wren. Other languages, like Lua, seem to have much more efficient table implementations.

map_numeric - wren             .......... 1.73s 0.0912 120.43% relative to baseline - growth factor 3
map_numeric - lua              .......... 0.22s 0.0057  12.96%  - much faster
map_numeric - python           .......... 1.72s 0.0743  99.46%

mhermier commented 4 years ago

Because there is no answer, because there is no ideal generic situation to test against. In my program 3 might be a waste of memory. But in yours 20 might not be enough. If you really see/need an improvement in your program, just tune it in your program. Like I said the best growth, is no growth. So questing for the ideal value is futile and a waste of time.

benpop commented 4 years ago

@billyquith Beware of a comparison to Lua here: map_numeric in the standard Lua C implementation is much faster because the Lua "map" (table) is implemented as a hybrid array/map. Since the keys are added sequentially from 1, the implementation only uses/grows the array portion.

Note the actual hash map portion only grows/rehashes when there are no other slots remaining. Lua tables do not have a load factor.

billyquith commented 4 years ago

@benpop, Yes, thanks, I was aware of the dual hashtable implementation in Lua, and you're right the map_string benchmark would be a fairer comparison for the hash table. Wren works better with higher growth factor, Lua has a faster hash table, and Python 3.8 is slightly slower than Wren.

map_string - wren              .......... 0.18s 0.0047 113.74% relative to baseline
map_string - lua               .......... 0.14s 0.0027  76.37%
map_string - python            .......... 0.17s 0.0063  94.44%

However, the map_numeric benchmark gives an idea for how much better performance can be with a different perspective or specific tuning. Perhaps Wren could benefit from dynamic arrays or a dual hashtable implementation.

billyquith commented 4 years ago

@benpop

Note the actual hash map portion only grows/rehashes when there are no other slots remaining. Lua tables do not have a load factor.

That is an interesting observation but the Lua hash table keeps track of the free list, so its insertions are more optimal. Wren has lots of collisions (duplicate indices) whilst inserting, accessing, and resizing, especially in smaller tables. This is likely why aggressively growing the hash table is effective. This also assumes that the start index (modulo hash) has good distribution.

Lua also does more to try and avoid collisions by moving colliding nodes that are in inappropriate slots to available slots. Wren, on the other hand, has to traverse a large part of the entries when there are a lot of collisions, so its behaviour tends towards O(n) (a linear search), rather than O(1), which a hash table should be (ideally).

wren-lang / wren

Map optimisations #847