perl11 / cperl

A perl5 with classes, types, compilable, company friendly, security
http://perl11.org/
Other
142 stars 17 forks source link

new hash table #24

Open rurban opened 9 years ago

rurban commented 9 years ago

The old hash table uses:

either use:


maybe use the glibc htable (hsearch, hcreate), just add random seeds and flood detection (collision counts). (or khash, which is basically the same). coucal looks good also. https://github.com/xroche/coucal but best is probably preshing's http://preshing.com/20160314/leapfrog-probing/, which even can be made concurrent easily.

open addressing is cache friendly and much faster, but table growth is slow. potion uses khash, first double open, then quadratic open.

either way, first I need to abstract away the collision search in MACROS and reduce the needed comparisons from 4 to 1. (edit: after my critic, they improved that part to one ptr cmp) maybe we can normalize all keys to utf8 for faster checks. (probably not) utf8 is a HEK flag. See http://cpansearch.perl.org/src/RURBAN/illguts-0.49/index.html#hek The current implementation is just insanity and it only got worse over the years.

The last attempt was here: https://github.com/rurban/perl/commits/rurban/hash-sortbuckets See the featurex/gh24-*hash* branches. See also GH #102


plan

rurban commented 8 years ago

Merged now very simple, traditional improvements:

no hashtable rewrite yet.

rurban commented 8 years ago

Current problem: corrupt hek with tied hv. ./miniperl -Ilib -MConfig -e'print keys %Config'

The reason was that HVf_SVKEY(-2) was compared against HEK_LEN which is now an unsigned 31 bit value. Fixed with 746a63c91a306bc26713574ca67420bcd279758b

Fixed also an issue with a wrong HekTAINTED check with HeKSVKEY. There the sv is tainted or not, the hek has no flags. This 9e9ec088791b75fcaf5271a2ebb1c412f721ab2c needs to be backported.

rurban commented 8 years ago

With d58d80f40124ed394a72e5f41782c1ca86f26e7e found and fixed an important hsplit bug.

Detected with -DHv diffing and terminal broadcast input to two panes in parallel (iTerm2) in 2 gdb sessions. Very nice debugging this way to compare old vs new.

rurban commented 8 years ago

failing minitests with cperl-5.25.0-165-g0c508ac featurex/gh24-array_he

mpb t/op/local.t 108,229,232,285,294 t/op/lex_assign.t 281-304 t/op/smartmatch.t 134 t/op/stash.t 5 (fixed with e3f6ad7161bdc4689b81cee207e5aa37a9653875) t/op/tie_fetch_count.t 75

../do-bench-mini: 2-9% faster

fixed with 868b47c6bb53b375089b52ab7a1c25b933335ffa fix deletion of first_entry