martinus / robin-hood-hashing

Fast & memory efficient hashtable based on robin hood hashing for C++11/14/17/20
https://gitter.im/martinus/robin-hood-hashing
MIT License
1.5k stars 142 forks source link

A way to disable the bulk allocator and/or not have it never free memory #107

Closed cculianu closed 3 years ago

cculianu commented 3 years ago

It seems to me the recent memory leak bug was at its heart partially caused by the design decision to have a custom bulk allocator that never frees.

I am not sure in what reality a map that never frees is a great idea. Such maps will, over time, end up using max memory that they ever once used in their history. This seems like quite the sacrifice. What's more depending on usage pattern one can imagine pathological situations where the allocator is holding onto gigs of tiny pieces of memory but it cannot satisfy its current demand so it just allocates 1 ginormous piece. Thus taking up 2x memory of what it needs, etc. And.. since it never frees.. well.. now you got 2x the memory consumption your hashtable ever used.. until it's destructed. And so on...

On top of that not everybody is using terrible glibc for allocation. There are good allocators out there -- such as jemalloc.

I believe the bulk allocator just exists so you can win benchmark competitions but isn't a feature that, as it stands right now, is ideally suited for real-world applications.

Please consider offering some knobs on the bulk allocator. Such as the ability to disable it outright or limit its max memory usage. Why should my app forever have to suffer that it once needed 5GB of memory 2 weeks ago during a load spike? Now I'm stick with a hashtable that's empty but is using 5GB memory because the allocator hangs on to every piece of memory it sees forever. The fact that the allocator never "forgets" about its memory is a real problem.

This hash table you designed is very fast but it's designed to be general purpose. I perhaps should have read the fine print on it before. I never realized calling reserve(1024) repeatedly could lead to my leaking memory. Or that it never frees memory that you ever give it.

jemalloc is good. It already does a lot of the work this library does, but in a general purpose way. If I use jemalloc, why do I also need your allocator to hog up memory?

Please.. offer some knobs and/or ability to disable. A nearly empty hashtable taking up 5GB is ridiculous. It may be great for benchmarks. But is just.. comical otherwise.

martinus commented 3 years ago

I am not sure in what reality a map that never frees is a great idea.

It's exactly the same principle that a vector or a string uses, so I guess I'm not alone with this design :) You might also want to try switching between the flat_map and node_map. The node_map has fewer memory spikes. For my use case, this is generally the biggest issue.

If memory usage is an issue, I can highly recommend https://github.com/Tessil/sparse-map. This is a very fast hashmap, and has very low memory requirements. That might be a better trade off for your application. See some benchmarks here: https://martin.ankerl.com/2019/04/01/hashmap-benchmarks-03-01-result-InsertHugeInt/

The reserve(1024) memory leak was a bug that has been fixed, sorry about that issue!

cculianu commented 3 years ago

Ok, thanks for the update and the tip on using that map. I appreciate your reply. Sorry if my initial issue was a bit polemic -- I was stressed at having found the memory leak and.. well, I was just stressed.

Keep up the good work, and thanks!

(and do consider disabling that feature or allowing it to not exist somehow.. eek!).