Closed hitdshu closed 1 year ago
I also want to ask another question. How do I support key of the type std::pair<int, int> in your current hash_table7?
speed depends on your case.
for big key-value( sizeof(k+v) >= 24 ) emhash8 (hash_table8.hpp) is good choice, it's a dense hash map.
if you care about inserting speed or high load factor(>0.95) emhash7 is better.
emhash5 hash_table5.hpp is more efficient for erasing and find hit 100%. (add marco EMH_FIND_HIT can improve 10%)
more ressuly as flow you can try run my bench https://github.com/ktprime/emhash/blob/master/bench/ebench.cpp g++ -I.. -I../thirdparty -O3 -march=native -DET=1 -std=17 -DHOOD_HASH=1 -DABSL=1 ebench.cpp -o eb
I also want to ask another question. How do I support key of the type std::pair<int, int> in your current hash_table7?
you need a hash function if key is not integer or string. it's as same as std::unordered_map
Thanks for your kind reply.
I'll use your emhash7 in my project as it is both fast at insert and find.
And for key type of std::pair<int, int>, if your emhash7 is the same as std::unordered_map, I think boost::hash_combine might be appropriate here.
Best, Deshun
my code focus on optimiazing finding hit. some my bench result as follow.
-------------------------------- function benchmark -----------------------------------------------
erase_50
64 emhash5 100.0%
66 emhash6 98.2%
75 emhash8 85.8%
99 emhash7 65.5%
erase_reinsert
86 emhash6 100.0%
87 emhash5 98.6%
91 emhash8 94.1%
96 emhash7 89.4%
find_erase_50
76 emhash5 100.0%
82 emhash6 92.7%
85 emhash7 89.3%
95 emhash8 79.4%
find_hit_0
82 emhash8 100.0%
92 emhash6 88.7%
94 emhash5 87.1%
97 emhash7 83.8%
find_hit_100
83 emhash5 100.0%
84 emhash7 99.3%
85 emhash6 98.0%
97 emhash8 85.8%
find_hit_50
84 emhash5 100.0%
85 emhash7 97.7%
88 emhash6 94.8%
95 emhash8 87.9%
find_hit_50_erase
67 emhash8 100.0%
78 emhash5 86.8%
95 emhash6 70.9%
98 emhash7 68.8%
hash_iter
6 emhash8 100.0%
26 emhash7 25.5%
27 emhash6 24.9%
100 emhash5 6.8%
insert_erase
76 emhash5 100.0%
86 emhash6 89.1%
89 emhash7 86.3%
96 emhash8 79.4%
insert_find_erase
33 emhash5 100.0%
58 emhash8 57.5%
91 emhash6 36.8%
100 emhash7 33.7%
insert_high_load
46 emhash7 100.0%
53 emhash6 87.7%
83 emhash5 55.9%
92 emhash8 50.1%
insert_hit
79 emhash6 100.0%
86 emhash7 91.8%
94 emhash5 84.7%
94 emhash8 84.1%
insert_l1_cache
74 emhash7 100.0%
85 emhash6 87.1%
87 emhash5 85.0%
99 emhash8 75.3%
insert_l3_cache
81 emhash7 100.0%
85 emhash6 94.7%
91 emhash5 89.0%
96 emhash8 84.4%
insert_no_reserve
68 emhash6 100.0%
74 emhash7 92.5%
80 emhash5 85.1%
97 emhash8 70.7%
multi_small_ife
78 emhash5 100.0%
81 emhash6 97.0%
84 emhash7 92.9%
98 emhash8 79.7%
======== hash top1 top2 top3 =======================
emhash5 1.0 15.0 16
emhash6 27.0 6.0 0
emhash7 0.0 14.0 18
emhash8 7.0 0.0 1
======== hash score weigh ==========================
emhash8 81 95.5%
emhash7 81 95.5%
emhash6 84 98.8%
emhash5 85 100.0%
--------------------------------------------------------------------
Well, could you explain more about the generated table a bit?
I run your benchmark and got similar output, but I could not understand its concrete meaning...
the following function case erase_50 means removing 50% keys from the hash map.
erase_50
64 emhash5 100.0%
66 emhash6 98.2%
75 emhash8 85.8%
99 emhash7 65.5%
emhash5 performance score is 100 (large is better, 64 is total time used "累计时间").
with many rand data cases and the average performace as follow (including many function cases).
======== hash score weigh ==========================
emhash8 81 95.5%
emhash7 81 95.5%
emhash6 84 98.8%
emhash5 85 100.0%
my bench code uses more than10 hash function(interger key), >10 thirdpaty hash map. key/value can set to int, long, string and big struct.
g++ -O3 -march=native -mtune=native -I.. -I../thirdparty -DABSL=1 -DHOOD_HASH=1 -DRT=2 -DTKey=2 -DTVal=1 -DET=1 -std=c++17 ebench.cpp -o ebench
keyType = string, valueType = int64_t(8 bit)
-------------------------------- function benchmark -----------------------------------------------
erase_50
63 emhash8 100.0%
71 emilib2 88.2%
75 phmap_flat 83.2%
79 absl_flat 79.7%
80 emhash5 78.1%
85 martin_flat 73.5%
92 emhash6 68.3%
96 emhash7 65.3%
erase_reinsert
53 phmap_flat 100.0%
54 emhash8 98.1%
55 absl_flat 95.6%
62 emilib2 85.3%
62 emhash6 84.3%
63 emhash7 83.2%
65 emhash5 80.7%
100 martin_flat 53.0%
find_erase_50
38 emhash8 100.0%
50 emhash5 76.7%
51 martin_flat 74.4%
52 emhash7 74.2%
57 emilib2 67.2%
58 emhash6 65.5%
75 absl_flat 50.9%
100 phmap_flat 38.6%
find_hit_0
27 absl_flat 100.0%
31 phmap_flat 87.4%
32 martin_flat 85.8%
33 emilib2 83.6%
39 emhash8 69.5%
66 emhash7 41.9%
93 emhash6 29.6%
94 emhash5 29.3%
find_hit_100
48 emhash8 100.0%
69 absl_flat 69.8%
70 emhash5 68.8%
82 emhash7 59.0%
85 emilib2 56.6%
87 emhash6 55.3%
88 phmap_flat 54.9%
89 martin_flat 54.1%
find_hit_50
54 emhash8 100.0%
65 absl_flat 83.3%
76 emhash5 72.0%
78 emilib2 70.2%
81 phmap_flat 67.4%
82 martin_flat 66.3%
83 emhash7 65.4%
99 emhash6 55.0%
find_hit_50_erase
54 emhash8 100.0%
60 phmap_flat 89.2%
61 emilib2 87.5%
63 absl_flat 84.8%
75 martin_flat 71.5%
86 emhash5 62.8%
90 emhash6 60.0%
95 emhash7 56.7%
hash_iter
10 emhash8 100.0%
26 emhash7 39.5%
26 emhash6 39.0%
28 emilib2 36.3%
40 martin_flat 25.9%
41 phmap_flat 25.1%
42 absl_flat 24.6%
100 emhash5 10.4%
insert_erase
41 phmap_flat 100.0%
43 emhash6 95.1%
44 emhash7 93.6%
46 emhash5 90.1%
49 emhash8 85.1%
49 emilib2 84.4%
59 absl_flat 69.9%
100 martin_flat 41.7%
insert_find_erase
58 emilib2 100.0%
59 phmap_flat 98.3%
60 absl_flat 96.7%
63 emhash8 92.1%
75 martin_flat 77.2%
79 emhash7 73.4%
95 emhash5 61.0%
97 emhash6 60.0%
insert_high_load
29 emhash6 100.0%
29 emhash7 99.5%
39 emhash8 74.5%
52 absl_flat 55.9%
54 emilib2 54.1%
61 phmap_flat 47.5%
78 martin_flat 37.5%
99 emhash5 29.5%
insert_hit
42 absl_flat 100.0%
42 phmap_flat 100.0%
50 emilib2 84.2%
62 emhash8 68.9%
64 emhash7 65.9%
66 emhash6 64.3%
71 emhash5 59.6%
100 martin_flat 42.7%
insert_l1_cache
49 emhash6 100.0%
50 emhash8 97.7%
50 emhash7 97.5%
55 emhash5 89.4%
59 emilib2 84.0%
85 martin_flat 58.2%
92 absl_flat 53.9%
94 phmap_flat 52.3%
insert_l3_cache
45 phmap_flat 100.0%
49 absl_flat 93.3%
54 emilib2 84.3%
70 emhash8 65.1%
71 emhash5 64.1%
73 emhash6 62.1%
78 emhash7 58.8%
94 martin_flat 48.6%
insert_no_reserve
42 phmap_flat 100.0%
44 absl_flat 96.5%
52 emilib2 81.5%
65 emhash8 65.1%
66 emhash7 64.6%
66 emhash6 64.0%
73 emhash5 57.8%
100 martin_flat 42.7%
======== hash top1 top2 top3 =======================
absl_flat 0.0 3.0 1
emhash7 0.0 0.0 2
emhash8 6.0 1.0 0
emilib2 1.0 2.0 3
phmap_flat 0.0 1.0 1
======== hash score weigh ==========================
martin_flat 56 64.7%
emhash5 61 70.7%
emhash6 66 76.2%
emhash7 68 78.9%
phmap_flat 75 86.8%
emilib2 76 87.4%
absl_flat 76 87.5%
emhash8 87 100.0%
Thanks very much for your very kind reply.
Another question. It turns out I also need one hash set.... Does the four hash sets implemented in your code has similar speed? I'll take time to benchmark your hash set as well in this weekend.
Best, Deshun
Another question. If I want to cache the table memory across several function calls. Which hash table/hash set is possible? Can I directly clear the table/set at the start of each function call and the memory is still reused?
Another question. If I want to cache the table memory across several function calls. Which hash table/hash set is possible? Can I directly clear the table/set at the start of each function call and the memory is still reused?
As an example:
emhash::hash_map<int, int> cache_map; for (auto i = 0; i < 100; ++i) { cache_map.clear(); // *****other operations }
Can this cache the memory of the hash_map across the iterations? I.e., is the memory only allocated once if space allowed?
if u want to release memory of hash map , just call shrink_to_fit() or swap it with a empty map. clear() only removes cache data(call destruction) without memory release. call reserve() keep memory allocated.
emhash::hash_map<int, int> cache_map(1024); //keep at least 1024 free nodes, in fact capacity = 2048 nodes are allocated
cache_map.max_load_factor(0.8) ; // rehash if load_factor = size() / capacity > 0.8.
cache_map.reserve(1024 *4) // call rehash and alloc more memory.
cache_map.clear() // call destruction of each node
cache_map.shrink_to_fit(0.2) // release memory if load_factor < 0.2
Thanks for your answers
Hi,
Very excellent project!
I download your code, and make the comparison between std::unordered_map/robin_hood/unordered_dense.
I only do the benchmark on find elements. Your code is astonishingly fast. All results are with type similar to std::unordered_map<int, int>.
For finding an element in a map with 1000 elements. Here is my result: unordered_map : 15ns emhash_table7: 3.6 emhash_table8: 4.2 unordered_dense: 5.6 robin_hood: 7.8
Your implement is the fastest in terms of finding elements with 100% hit. However, which table is the fastest in all your implementations? I only tried table7 and table8.
Best, Deshun