Closed rurban closed 7 years ago
And he is indeed right. There's a lot of slack.
-Accflags=-DHASH_FILL_RATE=200
memtest.sh: fillrate 200% vs 90%.
7% less memory with p1, which is unicode regex. -e'use utf8; my $s="x≈x"; $s=~s/a≈e//i;'
timings are similar, 2-7% slower.
massif.2xfillrate.m0 256060 massif.2xfillrate.m1 380829 massif.2xfillrate.p0 717081 massif.2xfillrate.p1 2566637
massif.perl.m0 255977 massif.perl.m1 382614 massif.perl.p0 720200 massif.perl.p1 2756405
bench-mini (more is faster):
200% 90%
hash/copy 132 135
hash/each 107 105
hash/foreach-sort 88 83
hash/foreach 85 89
hash/get 117 115
hash/set 102 105
"smaller arrays with double collisions" ie hash is still uses chaining, but looks at two bucket slots?
yes. half HvARRAY size, and double the HE chain depth. less space, more time.
toddr prefers memory over speed.
for reference here is a work in progress playing with this and some benchmarks: https://rt.perl.org/Public/Bug/Display.html?id=130084
The main difficulty here is to find a solution universally better, in all/most contexts. It's pretty easy to come with samples that show a degradation either in speed or memory.
It's always a tradeoff. I think at the end whatever is the choice it probably should be a compiled option.
Note that this [perl #130084] patch will never make it into cperl.
cperl instead offers the easy -Accflags=-DHASH_FILL_RATE=200
configuration if someone wants to trash his hash linked lists with this.
cperl uses an optimized version of the split calculation with __builtin_ctz, so it can afford the uneven customized fill rate. your patch doesn't use that. so it's much slower than cperl.
for less memory wait until one of these cperl hash refactors land:
feature/gh102-smallhash feature/gh117-hashiter feature/gh24-base-hash feature/gh24-oldnew-hash-table featurex/gh24-hash-loop featurex/gh24-hash-loop+utf8 featurex/gh24-hash-utf8 featurex/gh24-open-hash feature/gh24-he-array featurex/gh24-array_he featurex/gh24-one-word-ahe
smallhash, open-hash and one-word-ahe all use dramatically less memory, and I haven't even added yet the new python/ruby/php hashes with the indirect small index, with factor 2 for bigger hashes.
toddr thinks that to use smaller arrays with double collisions will save memory and will be not much slower.