Open rurban opened 9 years ago
Merged now very simple, traditional improvements:
no hashtable rewrite yet.
Current problem: corrupt hek with tied hv.
./miniperl -Ilib -MConfig -e'print keys %Config'
The reason was that HVf_SVKEY(-2) was compared against HEK_LEN which is now an unsigned 31 bit value. Fixed with 746a63c91a306bc26713574ca67420bcd279758b
Fixed also an issue with a wrong HekTAINTED check with HeKSVKEY. There the sv is tainted or not, the hek has no flags. This 9e9ec088791b75fcaf5271a2ebb1c412f721ab2c needs to be backported.
With d58d80f40124ed394a72e5f41782c1ca86f26e7e found and fixed an important hsplit bug.
Detected with -DHv diffing and terminal broadcast input to two panes in parallel (iTerm2) in 2 gdb sessions. Very nice debugging this way to compare old vs new.
failing minitests with cperl-5.25.0-165-g0c508ac featurex/gh24-array_he
mpb t/op/local.t 108,229,232,285,294
t/op/lex_assign.t 281-304
t/op/smartmatch.t 134
t/op/stash.t 5 (fixed with e3f6ad7161bdc4689b81cee207e5aa37a9653875)
t/op/tie_fetch_count.t 75
../do-bench-mini: 2-9% faster
fixed with 868b47c6bb53b375089b52ab7a1c25b933335ffa fix deletion of first_entry
The old hash table uses:
either use:
HvARRAY = AHE[]
) can contain{ hent_he, hent_hash }
. this way the hash catches 99% of all comparisons already, and we don't have to chase the external hek ptr, when the hash check fails. every HE entry will then be 2 words (128), instead of one word (64), called AHE. The linked list still contains the oldHE*
, with{ hent_next, hent_hek, hent_val }
.maybe use the glibc htable (hsearch, hcreate), just add random seeds and flood detection (collision counts). (or khash, which is basically the same). coucal looks good also. https://github.com/xroche/coucal but best is probably preshing's http://preshing.com/20160314/leapfrog-probing/, which even can be made concurrent easily.
open addressing is cache friendly and much faster, but table growth is slow. potion uses khash, first double open, then quadratic open.
either way, first I need to abstract away the collision search in MACROS and reduce the needed comparisons from 4 to 1. (edit: after my critic, they improved that part to one ptr cmp) maybe we can normalize all keys to utf8 for faster checks. (probably not) utf8 is a HEK flag. See http://cpansearch.perl.org/src/RURBAN/illguts-0.49/index.html#hek The current implementation is just insanity and it only got worse over the years.
The last attempt was here: https://github.com/rurban/perl/commits/rurban/hash-sortbuckets See the
featurex/gh24-*hash*
branches. See also GH #102plan