Closed marioroy closed 1 year ago
That's not really surprising, the whole purpose of the test was to test the speed of the hash method. I wanted to try polymur-hash - the test is in my gtl repo which is equivalent to phmap except that it requires c++17.
BTW your test has an incorrect comparison function, you should compare the str
not the hash
.
I'm curious to see how you would use this in llil
. phmap
has some APIs that take the hash value if precomputed (look for with_hash
).
Oh,... The mind-boggling part was not so much about the variant running faster, but the possibility of Long List is Long able to run faster. And the joy in trying things.
BTW your test has an incorrect comparison function, you should compare the str not the hash.
I will try both ways for the LLIL demonstration. I do not understand why one cannot use the hash value as it's tied to the key. Am I assuming wrongly for the hash value to be unique per key?
phmap
has some APIs that take the hash value if precomputed (look forwith_hash
)
I saw them at a quick glance but did not realize the possibility. I will try. Thank you.
I commented out the if statement and the output matches between hash_bench
and hash_bench2
.
// if (++cnt == 6) break;
Please disregard my hash_bench2 demonstration. I see now the purpose of the hash_bench
example. To compare std::hash
against polymur-hash
.
I'm closing the issue. The hash_value is computed one time only, already. It has been an interesting journey. Again, thank you for the hash map library.
I compared equal l.str == r.str
and l.hash == r.hash
. The output is the same. Regardless, I changed the gist to l.str == r.str
per your suggestion.
hashbench: 7.016 seconds
hashbench2 l.str == r.str: 4.318 seconds
hashbench2 l.hash == r.hash: 2.813 seconds
The motivation for doing this was nothing more than curiosity or wondering if std::hash
is computed more than once per key. And if so, how much time saved if computed one time. I saw no measurable time difference for llil. No issue either for doing equal return l.hash == r.hash;
. The output is the same.
Edit: Changed equal to return l.str == r.str;
.
I'm curious to see how you would use this in llil.
@@ -126,13 +126,19 @@
using str_int_type = std::pair<str_type, int_type>;
using vec_str_int_type = std::vector<str_int_type>;
+using hash_type = std::size_t;
+
+struct S {
+ str_type str;
+ hash_type hash;
+};
// create the parallel_flat_hash_map with internal mutexes
using map_str_int_type = phmap::parallel_flat_hash_map<
- str_type, int_type,
- phmap::priv::hash_default_hash<str_type>,
- phmap::priv::hash_default_eq<str_type>,
- phmap::priv::Allocator<phmap::priv::Pair<const str_type, int_type>>,
+ S, int_type,
+ decltype( [](const S& k) { return k.hash; } ),
+ decltype( [](const S& l, const S& r) { return l.str == r.str; } ),
+ phmap::priv::Allocator<phmap::priv::Pair<const S, int_type>>,
10, std::mutex
>;
@@ -224,16 +230,19 @@
#else
str_type s(beg_ptr, klen);
#endif
+ hash_type h = std::hash<std::basic_string_view<char>>{}(
+ std::basic_string_view<char>{ reinterpret_cast<const char*>(&s[0]), s.size() }
+ );
// Use lazy_emplace to modify the map while the mutex is locked.
map_ret.lazy_emplace_l(
- s,
+ S{ s, h },
[&](map_str_int_type::value_type& p) {
// called only when key was already present
p.second += count;
},
[&](const map_str_int_type::constructor& ctor) {
// construct value_type in place when key not present
- ctor(std::move(s), count);
+ ctor(S{ std::move(s), h }, count);
}
);
}
@@ -323,7 +332,7 @@
// Store the properties into a vector
vec_str_int_type propvec; propvec.reserve(8 + map.size());
- for (auto const& x : map) propvec.emplace_back(x);
+ for (auto const& x : map) propvec.emplace_back(x.first.str, x.second);
map.clear();
cend2 = high_resolution_clock::now();```
It has been a lot of fun trying things and learned a lot.
Am I assuming wrongly for the hash value to be unique per key?
Yes, there is no guarantee that this would be the case. Maybe this is the case here because we have a small number of different strings, but again this is not a given.
Thank you for the extra clarity on why not to do that. In the end, I omitted the change altogether.
I came across with_submap
inside the tests
folder. That is helpful in llil to iterate the submaps in parallel (phmap to vector).
// Store the properties into a vector
vec_str_int_type propvec; propvec.reserve(8 + map.size());
for (auto const& x : map) propvec.emplace_back(x);
map.clear();
// Store the properties into a vector
vec_str_int_type propvec; propvec.reserve(8 + map.size());
#pragma omp parallel for schedule(static, 1) num_threads(4)
for (size_t i = 0; i < map.subcnt(); ++i) {
map.with_submap(i, [&](const map_str_int_type::EmbeddedSet& set) {
if (set.size()) {
vec_str_int_type v; v.reserve(8 + set.size());
for (auto& p : set) v.emplace_back(p);
#pragma omp critical
propvec.insert(propvec.end(), v.begin(), v.end());
}
});
}
map.clear();
The llil4map.cc code is final.
The emhash entry recently appeared under "Explore repositories" on my GitHub page. Like flat_hash_map
, emhash
is not thread-safe. So, I thought about having the application handle locking including sub maps. Doing this gave me a new appreciation for parallel_flat_hash_map
.
emhash
performs similarly to flat_hash_map
, but flies when processing input containing duplicate keys. I ran with 1 thread for get_properties. Behind the scene, the examples consume 4 threads during hash map to vector. Sorting consumes max 24 threads.
$ NUM_THREADS=1 ./llil4map in/biga* in/biga* | cksum
llil4map (fixed string length=12) start
use OpenMP
use boost sort
get properties 43.310 secs
phmap to vector 0.290 secs
vector stable sort 0.687 secs
write stdout 1.608 secs
total time 45.897 secs
3291503642 712085184
$ NUM_THREADS=1 ./llil4hmap in/biga* in/biga* | cksum
llil4hmap (fixed string length=12) start
use OpenMP
use boost sort
get properties 33.839 secs
hashmap to vector 0.439 secs
vector stable sort 0.663 secs
write stdout 1.530 secs
total time 36.472 secs
3291503642 712085184
$ NUM_THREADS=1 ./llil4emh in/biga* in/biga* | cksum
llil4emh (fixed string length=12) start
use OpenMP
use boost sort
get properties 27.888 secs
emhash to vector 0.293 secs
vector stable sort 0.672 secs
write stdout 1.583 secs
total time 30.438 secs
3291503642 712085184
$ NUM_THREADS=24 ./llil4map in/biga* in/biga* | cksum
llil4map (fixed string length=12) start
use OpenMP
use boost sort
get properties 3.296 secs
phmap to vector 0.318 secs
vector stable sort 0.671 secs
write stdout 1.540 secs
total time 5.827 secs
3291503642 712085184
$ NUM_THREADS=24 ./llil4hmap in/biga* in/biga* | cksum
llil4hmap (fixed string length=12) start
use OpenMP
use boost sort
get properties 3.145 secs
hashmap to vector 0.297 secs
vector stable sort 0.673 secs
write stdout 1.580 secs
total time 5.697 secs
3291503642 712085184
$ NUM_THREADS=24 ./llil4emh in/biga* in/biga* | cksum
llil4emh (fixed string length=12) start
use OpenMP
use boost sort
get properties 3.094 secs
emhash to vector 0.219 secs
vector stable sort 0.667 secs
write stdout 1.535 secs
total time 5.516 secs
3291503642 712085184
That is helpful in llil to iterate the submaps in parallel (phmap to vector).
It would be more efficient to iterate once (not parallel) through the submaps to get their sizes, so you know exactly where to insert their content in the final (resized) vector, and you don't need the intermediate vector and the double copy.
So, I thought about having the application handle locking including sub maps.
Yes, you can certainly do that. It is equivalent to what the parallel_hash_map
does, it just requires a little extra work.
It would be more efficient to iterate once (not parallel) through the submaps to get their sizes, so you know exactly where to insert their content in the final (resized) vector, and you don't need the intermediate vector and the double copy.
Ah, yes. Thank you! This is fixed in the gists code. I added the run times for llil4hmap
and llil4emh
to the summary section in llil_results.txt.
Ah, I just realized the hash_bench example today. Thank you for the small demonstration. The following is not an issue but mainly curiosity. I tried hash_bench2.cc, computing
std::hash
once per key.Seeing 2.5x is mind-boggling to me. So much so that I will try the same thing for the Long List is Long demonstration and report back.