greg7mdp / sparsepp

A fast, memory efficient hash map for C++
Other
1.24k stars 123 forks source link

feedback please! #17

Closed greg7mdp closed 3 years ago

greg7mdp commented 7 years ago

This is not a real issue, but just a space to provide feedback on sparsepp, good or bad. I'd love to hear what you think. Thank you in advance.

dratchkov commented 7 years ago

I liked the reduced memory footprint and am switching my code to this where appropriate.

greg7mdp commented 7 years ago

Thanks David!

romange commented 7 years ago

@greg7mdp Is it possible to open source the benchmarking code? Could be part of this or separate repository...

greg7mdp commented 7 years ago

@romange, just added the repository hash-table-shootout which is the benchmarking code I used (updated from https://github.com/aggregateknowledge).

romange commented 7 years ago

@greg7mdp Thanks!

dselivanov commented 7 years ago

Awesome work. I made small R packages which allow to easily use sparsepp with R packages and Rcpp - https://github.com/dselivanov/sparsepp.

Initially I evaluated Sparse++ with my another project (https://github.com/dselivanov/text2vec) where main data structure is unordered_map< pair<uint32_t, uint32_t>, T >, T is int or float. Memory improvement was 2x and speed up was 1.5x (lookup and insert operations).

Thank you for amazing work.

greg7mdp commented 7 years ago

Thank you, @dselivanov, I really appreciate the feedback and I am glad you find sparsepp useful.

kmohanram commented 7 years ago

so, i evaluated sparsepp as a drop-in replacement for std::unordered_map in a personal library (header, more like) that i use in my work over the break. i have tuned the library for performance, and it seems at first blush that sparsepp can at best match the STL container in my benchmarking on a test suite that i consider representative. in past incarnations, my library used dense_hash_map with good results (0-20% improvement), but it seems like the STL implementation has bridged the gap with emplace/reserve/etc. at this point. although i cannot share the test suite, is there a good harness/logging approach that i can use to capture the trace and help move the sparsepp effort along? in my library, a user-defined class that i usually hash to a unit32_t is used as the key and the value is also a uint32_t (an array index). no deletions occur in the lifetime of a single run and i can happily reserve space for the max number of entries that i expect to see in a run (20 million give or take). cheers!

greg7mdp commented 7 years ago

@kmohanram, thanks for the feedback!

dense_hash_map should be much faster than the STL's unordered_map, I am surprised that you see only a 0-20% improvement.

So I suspect that the time is spent somewhere else than in the hash table code itself. Maybe the hash implementation for your 'key' class is taking most of the time. Or maybe copying it takes a long time? Do you have a move constructor?

kmohanram commented 7 years ago

std::hash is pass through and my hash simply adds three uint32_t fields in the user-defined class and returns that value. turns out that is a darn close (serendipitous, if i may confess) value to a little over the number of entries in the hash table. so, it is almost like an insertion hint to an empty place in the hash map, or a collision that more than likely is a hit (i.e., this is not a unique entry). my attempts to refine the hash further have fallen flat with measurable performance hits. so, i have eliminated the hash as a possible suspect already. the load factor confirms that the distribution is near uniform and that i am over-provisioning (which i am not worried about here and now).

re: copies and/or the move constructor, it is a trivial class with only these three uint32_t fields. ok, lest i seem like a joker, it is a packed class and the three fields collapse to a single uint64_t, but i am letting the compiler do the bit-twiddling for me. and also using all the compiler-defined constructors/etc. so unless i am wildly off, i cannot suspect anything here as well. but hey, life (and C++) is (are) full of surprises.

btw, since dense_hash_map does not quite support C++11, i hoped that sparsepp would fill in the blanks and improve performance by at least 10%. all i literally did was redefine the map and everything ran with a minor drag of 5% or so ...

kmohanram commented 7 years ago

i just ran the suite with the call to reserve commented out and i see a 20% performance degradation with sparsepp (on my mac, fwiw, since that can also affect these numbers). happy to work with you to help comprehend and resolve this, even if the use case is an anomaly and even if it is unique to the mac ...

interestingly enough, the map's load factor for the STL run is 0.80 while the load factor for the sparsepp run is 0.30 when i run with reserve commented out. with reserve in use, the load factors in both cases is 0.6ish.

greg7mdp commented 7 years ago

@kmohanram, for your hash function, can you try the following:

struct Hasher {
    size_t operator()(const T &t) const 
    { 
        const uin32_t base = 2166136261U;
        const uin32_t p = 16777619UL;

        size_t res = base;
        res ^= t.int1;
        res *= p;
        res ^= t.int2;
        res *= p;
        res ^= t.int3;
        res *= p;

        return res;
    }
};
kmohanram commented 7 years ago

well, this hash function (with reserve still in place) is a drag to the extent of 40% on the STL implementation. and sparsepp seems to drag by a further 10% over the STL implementation at first blush. it took a 35+% beating in several instances btw, and never does better.

kmohanram commented 7 years ago

i just think my hash function is -- serendipitously, i must ack again -- way too good for my application. i recall trying other prime-based hashing approaches when i used an array instead of a hash_map to store my entries and nothing came even close, unfortunately/fortunately. btw, do you have any suggestions as to how i can capture a meaningful subset of map-related ops (inserts/lookups) within my app so that we can abstract to a testbench and you can debug offline? i'm thinking all emplace calls need to be logged with the respective inputs. though my application control path depends on the results of these ops, at least you will have something to work with, especially if the results are consistent (sparsepp underperforms std::unordered_map).

greg7mdp commented 7 years ago

@kmohanram, please open a separate issue so we don't hijack this one.

From what you are writing, it looks like the hash is indeed not the issue.

What you could do is dump the inserts/lookups into a binary file, to be replayed later with a program like the one in issue #20 (closed now). If you do that I'll be happy to see if I find a problem, however the fact that there is not big difference between the std::unordered_map and the dense_hash_map makes me think that the time may be spent outside of the hash table code. This will be easy to check if you do implement the binary file testing method I suggest above.

dratchkov commented 7 years ago

@greg7mdp @kmohanram I just opened #21, I wonder if this is related to the issue seen here.

MichaelVoelkel commented 7 years ago

Hi Greg! I love your hash map and use it in my research where I have hundreds of millions and sometimes billions of entries with hundreds of thousands of accesses per second. Your map turns out to be the fastest among the ones I have tested and I am very pleased with it. Thanks for this!

greg7mdp commented 7 years ago

Wow, thanks @Elypson, really glad you like sparsepp. It is good to know that people out there use it and appreciate it.

chondl commented 7 years ago

I'm looking forward to using this in our application to reduce memory usage. Currently we are using STL unordered_map in an application compiled with Emscripten to run in the browser. Happy to report that the library worked as advertised as a drop-in replacement and our automated test suite found no bugs after the change.

Two questions:

greg7mdp commented 7 years ago

@chondl , thanks for the feedback.

chondl commented 7 years ago

No worries about unordered_multimap, we'll figure something out. Glad to have the space and time efficient hash. Even having a total memory allocated outside of malloc would be helpfuls.

AVJoKe commented 7 years ago

I would second the request for a unordered_multimap replacement, since some nested solutions won't work for me (like <key,std::vector>). Nevertheless I implemented your sparse_hash_map where it was feasible and could see great performance improvements compared to STL. Keep up the good work!

greg7mdp commented 7 years ago

@chondl , @AVJoKe , I looked into implementing the unordered_multimap, and unfortunately I don't think it will be possible to implement efficiently. One issue is that the equal_range() method needs to return two iterators allowing to iterate elements with the same key.

Because sparsepp uses open addressing and not chaining, it would be difficult to implement something that doesn't have an additional container per entry (like std::vector), and that would be very inefficient both time and space wise.

Maybe you can use Google's cpp-btree instead? It is very memory efficient, although the lookup is slower than sparsepp. But it supports the multimap.

samedsaka commented 7 years ago

hi @greg7mdp, i would like to thank you for sparsepp. i am using it for my research where i have hundreds of millions genomic entries. I need some advice about searching in sparse_hash_map. Is there any efficent way for searching in serialized hash map?

greg7mdp commented 7 years ago

@samedsaka , thank you, I'm glad you like sparsepp! About your question, are you asking about doing a search on the serialized file on disk, without loading it into memory? If that is the question, I'm afraid I don't think it would be easy, the serialization format is not designed for lookup, but only for doing a save/restore of the hash map on disk.

jackstockdale commented 7 years ago

Hi - just a quick note to thank you for sparsepp. We now use it extensively in place of google’s sparse_hash_map and various unordered_maps after significant testing showed numerous real world performance gains across a set of highish memory (200GB or more) applications. No issues or other feedback, but thanks!

greg7mdp commented 7 years ago

@jackstockdale, thank you so much for the feedback! I am really glad people use sparsepp... sometimes it is hard to tell. I very much appreciate you taking the time to let me know.

kwangswei commented 6 years ago

Great sparsepp!!! I had to put 6B of 64bit integers to set container for filtering duplicated numbers, it was very fast and stable. Thank you so much. I love it!

OvermindDL1 commented 6 years ago

I'm using sparsepp as kind of like a sparse array in high-performance lookups and it is fantastic, well outperformed the other attempts in a variety of other libraries. I love it!

greg7mdp commented 6 years ago

Thanks for the feedback @kwangswei and @OvermindDL1, I am delighted you find sparsepp useful!

jmbnyc commented 6 years ago

In an earlier post, someone asked if you have a "memory byte count" API to obtain the number of memory bytes allocated by the map/set. You had said that you were going to add it. Did you? If yes, what is the API call to obtain the memory byte count?

srbehera11 commented 6 years ago

Hi Greg, I am using your hash-map in one of my projects and it seems the memory improvement is 2X but I cannot see any time improvement. I am using the following hash-map

sparse_hash_map<uint64_t, pair<uint8_t, uint32_t>> MAP

greg7mdp commented 6 years ago

Thanks for the feedback @srbehera11 . sparsepp main claim to fame is its low memory usage, there is no guarantee that you would see a speed improvement. It will depend on what you compare it with (there are different implementations of unordered_hash in different C++ library implementations).

It may also depend on the quality of your hash function. You could try #define SPP_MIX_HASH 1 before including spp.h to see if it makes a difference.

greg7mdp commented 6 years ago

someone asked if you have a "memory byte count" API to obtain the number of memory bytes allocated by the map/set.

@jmbnyc , it would not be easy to provide this functionality in a meaningful, as it depends on the overhead of the memory allocator used. sparsepp does allocate multiple small buffers, so the memory allocator padding can have a big influence on the amount of memory actually used.

jmbnyc commented 6 years ago

Greg, Thanks for the response. I would argue that returning what the map allocates regardless of the allocator would be meaningful. Why? Well, IMO it allows one to understand the amount of memory allocated vs the memory to hold an array of the objects contained in the map, e.g. it provides the overhead associated with the map storing the elements at any point in time.

jmbnyc commented 6 years ago

Greg, One more thing. I observed this from coverity (from sparse_growth_policy.h):

static constexpr bool is_power_of_two(std::size_t value) {

  1. Condition value != 0, taking false branch.
  2. overflow: Subtract operation overflows on operands value and 1UL.

CID 10488 (#1 of 1): Overflowed return value (INTEGER_OVERFLOW)

  1. overflow_sink: Overflowed or truncated value (or a value computed from an overflowed or truncated value) value != 0UL && (value & value - 1UL) == 0UL used as return value. 113 return value != 0 && (value & (value - 1)) == 0; 114 }
greg7mdp commented 6 years ago

@jmbnyc If you want to see what sparsepp allocates, you could provide an allocator to that sparse_hash_map that keeps track of the memory allocated.

about the coverity report, this doesn't look like sparsepp code.

jmbnyc commented 6 years ago

Greg, Hmm. Let me check.

jmbnyc commented 6 years ago

Greg, Apologies. This was from a test setup where I was comparing multiple implementations.

greg7mdp commented 6 years ago

Jeffrey, no problem, I really appreciate the feedback and comments.

SlowPokeInTexas commented 6 years ago

Here's what I used to get a handle on memory. I didn't use it for sparsepp, but I did use it for std::map to get a handle on how much memory each entry would take. I basically wrote a small program that allocated one thousand entries and divided the memory used by 1000. Note for MemTrack to work everything has to be dynamically allocated, which is easy to do in a sample program. Anyway, I hope this is helpful.

http://www.almostinfinite.com/memtrack.html

On Sat, Apr 21, 2018 at 12:38 PM, Gregory Popovitch < notifications@github.com> wrote:

Jeffrey, no problem, I really appreciate the feedback and comments.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/greg7mdp/sparsepp/issues/17#issuecomment-383315363, or mute the thread https://github.com/notifications/unsubscribe-auth/ASpl2Hr1_1lG5joE2Qp-Pfcl0X6Uin8yks5tq26BgaJpZM4Kvvz5 .

-- "There are very different rules for the Champ vs. the Challenger. It's not fair, but it's life. If you want to be the Champ you're going to have to work the hardest."

srbehera11 commented 6 years ago

@greg7mdp Thanks for your reply !! I tried #define SPP_MIX_HASH 1 before including spp.h, but it did not make any difference.

I am now using a vector of maps vector < sparse_hash_map<uint32_t, uint32_t> > MAP(64) where the key is a hash value of an element generated by MurmurHash3_x86_32. In my program, I need to deallocate the memory of some maps based on conditions. Does MAP[i].clear(); will deallocate the memory of ith map?

greg7mdp commented 6 years ago

@srbehera11, MAP[i].clear(); will not deallocate all the memory. I suggest you do instead: sparse_hash_map<uint32_t, uint32_t>().swap(MAP[i]).

You may want to use a typedef:

typedef sparse_hash_map<uint32_t, uint32_t> SMap;
SMap().swap(MAP[i]);
srbehera11 commented 6 years ago

@greg7mdp Thanks !! What exactly swap() function does ? Currently, my program takes 2gb for 135m entries. I just noticed that you mentioned here to change #define SPP_ALLOC_SZ 0 to #define SPP_ALLOC_SZ 1 to increase memory efficiency. I need to change in spp.h file, right?

greg7mdp commented 6 years ago

@srbehera11, swap() swaps the content ow the two maps, so the one in the vector now is empty as it was replaced by a newly created one (and the temporary is destroyed). You can try -D SPP_ALLOC_SZ=1 on the compiler command line, no need to change spp_config.h.

eglaser77 commented 5 years ago

@greg7mdp Is there any way to get the total memory usage from the map? Something like cpp-btree's bytes_used function (https://github.com/algorithm-ninja/cpp-btree/blob/master/btree.h#L1173)? Thanks!

greg7mdp commented 5 years ago

any way to get the total memory usage from the map?

@eglaser77, Unfortunately, there is no exact way to provide this information. sparsepp makes many small memory allocations, and while it would be possible to compute the memory requested, each memory allocation has an overhead which is unknown to sparsepp, so the returned result would not be accurate.

However, you could use the function GetProcessMemoryUsed() from spp_memory.h to measure memory usage before and after filling the sparsepp container, to get the total memory usage.

eglaser77 commented 5 years ago

However, you could use the function GetProcessMemoryUsed() from spp_memory.h to measure memory usage before and after filling the sparsepp container, to get the total memory usage.

@greg7mdp Thanks for the response. I'll take a look at GetProcessMemoryUsed but I assume it gets all system memory and so it would be conflated with whatever other allocations/deallocations are going on at the same time as the map is being used.

greg7mdp commented 5 years ago

@eglaser77 I believe this will return the memory used by your process only, not the whole system.

JerryKwan commented 5 years ago

Is there a way to use this from golang? thank you