hordi / hash

Fast C++ flat (open addressing) hash set, map
MIT License
3 stars 0 forks source link

benchmark hash #2

Open ktprime opened 4 years ago

ktprime commented 4 years ago

I have use your bench code in my github https://github.com/ktprime/emhash/blob/master/bench/hbench.cpp. now I have test more than 6 different hash map' bench code.

ktprime commented 4 years ago

from my benchmark hdset is really fast for inserting but some slow for find beacase of tombstones used?. Is a optimize linear probing of find empty bucket in your code?

hordi commented 4 years ago

Correct, it uses linear probing. For now I don't see a trivial solution to improve that part, open for any idea. Will do some research of using "robin hood" and "index" (to simplify swap-part).

ktprime commented 4 years ago

linear probing suffers primary clustering if bad hash function input. quadratic probing also incures secondary clustering and much more cache miss. my early probing combine with three different probing way to search the empty bucket.

ktprime commented 1 year ago

what‘s new ?

hordi commented 1 year ago

Compiler syntax fixed + micro-optimization, nothing really important.

2 November 2022, 01:14:58, by "hyb" < @.*** >:

what‘s new ?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

ktprime commented 1 year ago

some new benchmark https://martin.ankerl.com/2022/08/27/hashmap-bench-01/

hordi commented 1 year ago

When you have some time - could you run performance comparison in your benchmark for my new implementations: hash_set_m.h and hash_set1.h. Both have the same interface just different namespace: hdr_m and hdr1 respectively.

ktprime commented 1 year ago

OK, i'll try, pls add interface insert_or_assign. from my benchmark it's quite fast

======== hash  top1   top2  top3 =======================
    absl_flat  9.0   1.0    0
      emhash5  0.0   0.0    1
      emhash8  0.0   0.0    1
       hrdset  0.0   7.0    0
 martin_dense  0.0   0.0    3
  martin_flat  1.0   2.0    5
======== hash    score  weigh ==========================
    absl_flat    84     100.0 %
       hrdset    81     104.4 %
  martin_flat    78     107.7 %
 martin_dense    77     109.7 %
      emhash5    74     114.6 %
      emhash8    73     116.0 %
      emhash6    68     124.3 %
      emhash7    65     128.7 %
hordi commented 1 year ago

insert_or_assign - any example? what is the expected behavior of this function?

ktprime commented 1 year ago

detail function test. you can add to my ebench.cpp

-------------------------------- function benchmark -----------------------------------------------
erase_50
  66        emhash5                100.0 %
  73        hrdset                 110.6 %
  73        emhash7                110.9 %
  79        emhash6                119.4 %
  93        martin_dense           141.0 %
  95        emhash8                142.7 %

erase_50_find
  61        emhash5                100.0 %
  63        emhash8                102.1 %
  64        emhash7                103.8 %
  66        emhash6                106.8 %
  87        martin_dense           142.1 %
  96        hrdset                 155.3 %

erase_50_reinsert
  53        martin_dense           100.0 %
  60        emhash8                113.0 %
  62        hrdset                 118.0 %
  85        emhash6                160.1 %
  92        emhash5                173.2 %
  93        emhash7                175.9 %

find_erase50
  46        martin_dense           100.0 %
  47        emhash5                102.6 %
  47        hrdset                 104.0 %
  50        emhash8                110.7 %
  93        emhash6                203.4 %
  96        emhash7                209.5 %

find_hit_0
  46        martin_dense           100.0 %
  63        emhash8                137.7 %
  77        hrdset                 167.1 %
  86        emhash7                187.6 %
  95        emhash5                206.4 %
  97        emhash6                212.6 %

find_hit_100
  47        emhash6                100.0 %
  48        emhash5                101.6 %
  52        hrdset                 110.5 %
  60        emhash7                127.1 %
  86        emhash8                181.6 %
  94        martin_dense           199.3 %

find_hit_50
  81        martin_dense           100.0 %
  84        emhash8                102.7 %
  85        emhash5                104.5 %
  87        hrdset                 106.8 %
  89        emhash7                110.0 %
  91        emhash6                112.2 %

insert_erase
  42        hrdset                 100.0 %
  55        emhash7                130.1 %
  58        emhash6                136.7 %
  64        emhash5                150.8 %
  70        emhash8                165.8 %
  96        martin_dense           226.9 %

insert_find_erase
  50        hrdset                 100.0 %
  60        martin_dense           120.0 %
  68        emhash8                136.1 %
  73        emhash5                146.6 %
  78        emhash7                156.5 %
 100        emhash6                198.9 %

insert_hit
  60        emhash8                100.0 %
  69        martin_dense           113.2 %
  69        emhash5                114.1 %
  77        emhash6                126.7 %
  86        hrdset                 141.9 %
  93        emhash7                152.8 %

insert_l1_cache
  23        emhash5                100.0 %
  23        emhash6                100.0 %
  25        emhash8                107.0 %
  28        emhash7                122.9 %
  34        martin_dense           149.5 %
 100        hrdset                 428.1 %

insert_l3_cache
  57        hrdset                 100.0 %
  58        martin_dense           101.4 %
  70        emhash8                122.1 %
  73        emhash6                127.6 %
  74        emhash5                129.4 %
 100        emhash7                173.1 %

insert_no_reserve
  55        hrdset                 100.0 %
  65        martin_dense           117.8 %
  80        emhash8                144.4 %
  82        emhash6                148.4 %
  87        emhash5                156.0 %
  98        emhash7                177.1 %

insert_reserve
  44        martin_dense           100.0 %
  49        hrdset                 112.2 %
  58        emhash8                131.5 %
  79        emhash6                179.1 %
  83        emhash5                188.9 %
  99        emhash7                223.6 %

multi_small_ife
  72        martin_dense           100.0 %
  75        emhash7                104.1 %
  78        emhash5                107.6 %
  80        emhash6                110.8 %
  86        emhash8                119.7 %
  97        hrdset                 134.2 %

======== hash  top1   top2  top3 =======================
      emhash5  1.0   3.0    3
      emhash8  5.5   2.0    5
       hrdset  3.5   2.0    5
 martin_dense  4.0   6.0    1
======== hash    score  weigh ==========================
 martin_dense    83     100.0 %
       hrdset    81     101.5 %
      emhash8    80     103.7 %
      emhash5    79     104.1 %
      emhash6    74     112.1 %
      emhash7    69     119.0 %
--------------------------------------------------------------------
ktprime commented 1 year ago

it's some like operator[], what's your default load factor?

hordi commented 1 year ago

Load factor is fixed 0.5 for all versions except of hash_set7.h. But hash_set7.h I can't say it is fast because for high load factor there too many collisions and this design uses linear probing actually.

ktprime commented 1 year ago

this new version has any new ideal to improve? it's seem efficient than my emhash and quite fast on insert. it has bugs by benchmark(some test case failed).

hordi commented 1 year ago

which bugs?

hash_set.h - stores 32-bits hash value, so it is fast for complex data-type when hash-calculation is expensive. And it was designed to 32-bit hash respectively, any 64-bits (size_t) hash will be truncated to 32-bits. hash_set1.h - NEW. Uses standard 64-bits (size_t) hash. Should work slower than hash_set.h for any "reallocation" because should calculate hash again for all current data. hash_set_m.h - NEW. Instead of 32-bits hash value stores 1-byte per each allocated "piece" of data just to mark an element used-deleted-empty. Should work good for simplest data like int32_t, int64_t etc. Almost identical to hash_set1.h but does not store hash-value as part of "key-value" record. But expected cache misses - have to look/traverse block of memory of "markers" before and after that look/compare actual keys. I have pretty different performance results on AMD (Rizen 5900X) and Intel (Core i5-8300H if I'm not wrong) CPUs vs hash_set.h comparison. hash_set6.h - same as hash_set.h, just different base class organization. hash_set7.h - similar to hash_set.h but LOAD FACTOR working here. For most data I use I see lower performance than hash_set.h because of collisions.

Things 1.Need to think - may be for erase-functions I could rehash everything if number of deleted items too big. 2.I've tried to store data in number of locations instead of one big peace of memory, but lost performance significant - like 50% just for simplest & operation and usage 8 blocks of memory with the same algorithm and functions. 3.Will try solution to store not 1-byte for element attribute but 2 bits only. But expecting some slowdown because of high price for bits check/update. But can store 3x memory for attributes. Will see.

ktprime commented 1 year ago

I change your load factor to 0.8

n = 2225267, keyType = int, valueType = int64_t(8), loop_sum|loop_rand = 682|1022 us, sum = -1528380706 emilib2 (0.531): insert_reserve 93, insert_hit 59, find_hit_100 71, find_hit_50 56, find_hit_0 46, find_erase50 53, emhash7 (0.530): insert_reserve 104, insert_hit 51, find_hit_100 33, find_hit_50 45, find_hit_0 75, find_erase50 81, emhash5 (0.530): insert_reserve 94, insert_hit 44, find_hit_100 31, find_hit_50 45, find_hit_0 76, find_erase50 45, hrdset insert_l1_cache 2224987 != 2225267 (o) hrdset insert_l3_cache 2224987 != 2225076 (o)

hordi commented 1 year ago

I've found the root of the issue - it is in insert_cache_size function. Instead of tmp = std::move(empty); should be used tmp = hash_type(); because by the Standard that empty object can store anything after move-operation, requirement only to be valid. So in my code I call swap function, stl-implementation looks like calls clear additionally but that behavior not guaranteed.

I've updated my move operators just for this test to work consistently with other libraries, here some benchmark results cross all mine containers + other default used in ebench (I was using my default hash-function also).

test.log

ktprime commented 1 year ago

excellent result, try to add your hash map to martin's bench code(martinus_bench.cpp), it's quite different from mine. but the bug is also exits in lastest version.

hrdset (0.703): multi_small_ife 317, hrdset insert_l1_cache 5899395 != 5901390 (o) hrdset insert_l3_cache 5899468 != 5900964 (o)

hordi commented 1 year ago

About the bug... Did you update that tmp = std::move(empty) to tmp = hash_type() in your code?

mbench.log

ktprime commented 1 year ago

Try to use LLVM (clang-cl)/GCC(wsl) to compile, it's better than ms vc++ on windows. I bench with (gcc/clang/vc++) 3 different compilers on many cpus, and result is different. it's hard to say which one is the fastest hash map.

hrd_m is very fast on inserting because of low load factor = 0.5, and all the other hash map's default load factor are set to 0.8 by me.

ktprime commented 1 year ago

my emhash can set default load facor to 0.99 and also fast on inserting/erasion , if you like you can test ebench.cpp with case insert_high_load and insert_erase_high. ./ebench t