martinus / robin-hood-hashing

Fast & memory efficient hashtable based on robin hood hashing for C++11/14/17/20
https://gitter.im/martinus/robin-hood-hashing
MIT License
1.5k stars 142 forks source link

1. opt. calloc memory for big hashtable #141

Closed zhanglistar closed 2 years ago

zhanglistar commented 2 years ago

If the hash table is big, say 1million or more, calloc big memory can be bottle neck, using malloc and memset metadata to optimize this.

martinus commented 2 years ago

Do you have numbers how much faster malloc + memset is? On what system?

Usually calloc should be faster. See https://stackoverflow.com/a/2688522/48181

zhanglistar commented 2 years ago

Do you have numbers how much faster malloc + memset is? On what system?

Usually calloc should be faster. See https://stackoverflow.com/a/2688522/48181

Notice that I did not memset all the memory, instead only the mInfo part which is really small, and I have tested in my program that will increase performance a lot.

The picture is after optimization, which is really fast than original calloc all the memory. image

The last line means that the malloc and memset mInfo memory use 18 ms to memset the mInfo data structure of 67109127 Byte with total memory 1677727983 Byte and max capacity of 67108864 elements.

And here is the calloc and other memory allocation calls benchmark: image Code is here:https://github.com/lemire/Code-used-on-Daniel-Lemire-s-blog/tree/master/2020/01/14

BTW, I have used this map and optimization as a replacement of std::unordered_map/set in clickhouse code, details on pr:https://github.com/ClickHouse/ClickHouse/pull/33180

martinus commented 2 years ago

Ah, it make sense to just initialize mInfo. That's certainly better. Can you update your PR a bit to use indentation with spaces and std::memset? like so:

diff --git a/src/include/robin_hood.h b/src/include/robin_hood.h
index 32e5641..a1b6d2d 100644
--- a/src/include/robin_hood.h
+++ b/src/include/robin_hood.h
@@ -2323,13 +2323,14 @@ private:

         auto const numElementsWithBuffer = calcNumElementsWithBuffer(max_elements);

-        // calloc also zeroes everything
+        // malloc & zero mInfo. Faster than calloc everything.
         auto const numBytesTotal = calcNumBytesTotal(numElementsWithBuffer);
         ROBIN_HOOD_LOG("std::calloc " << numBytesTotal << " = calcNumBytesTotal("
                                       << numElementsWithBuffer << ")")
         mKeyVals = reinterpret_cast<Node*>(
-            detail::assertNotNull<std::bad_alloc>(std::calloc(1, numBytesTotal)));
+            detail::assertNotNull<std::bad_alloc>(std::malloc(numBytesTotal)));
         mInfo = reinterpret_cast<uint8_t*>(mKeyVals + numElementsWithBuffer);
+        std::memset(mInfo, 0, numBytesTotal - numElementsWithBuffer * sizeof(Node));

         // set sentinel
         mInfo[numElementsWithBuffer] = 1;
zhanglistar commented 2 years ago

Ah, it make sense to just initialize mInfo. That's certainly better. Can you update your PR a bit to use indentation with spaces and std::memset? like so:

diff --git a/src/include/robin_hood.h b/src/include/robin_hood.h
index 32e5641..a1b6d2d 100644
--- a/src/include/robin_hood.h
+++ b/src/include/robin_hood.h
@@ -2323,13 +2323,14 @@ private:

         auto const numElementsWithBuffer = calcNumElementsWithBuffer(max_elements);

-        // calloc also zeroes everything
+        // malloc & zero mInfo. Faster than calloc everything.
         auto const numBytesTotal = calcNumBytesTotal(numElementsWithBuffer);
         ROBIN_HOOD_LOG("std::calloc " << numBytesTotal << " = calcNumBytesTotal("
                                       << numElementsWithBuffer << ")")
         mKeyVals = reinterpret_cast<Node*>(
-            detail::assertNotNull<std::bad_alloc>(std::calloc(1, numBytesTotal)));
+            detail::assertNotNull<std::bad_alloc>(std::malloc(numBytesTotal)));
         mInfo = reinterpret_cast<uint8_t*>(mKeyVals + numElementsWithBuffer);
+        std::memset(mInfo, 0, numBytesTotal - numElementsWithBuffer * sizeof(Node));

         // set sentinel
         mInfo[numElementsWithBuffer] = 1;

Done.

martinus commented 2 years ago

Merged, thanks!