greg7mdp / sparsepp

A fast, memory efficient hash map for C++
Other
1.25k stars 123 forks source link

Possible iterator bug? #19

Closed indigox4 closed 7 years ago

indigox4 commented 7 years ago

I'm replacing some usage of std::unordered_map with sparse_hash_map and I've encountered an occasional issue with iterators.

map_t::const_iterator iter = map.find(keyA); // after this line iter contains a good value
map[keyB] = (*iter).second; // after this line iter contains a garbage value

With this code, keyB will occasionally map to some garbage values (0xdddddddd, indicating deleted memory), and not the same value as keyA, which is what I would expect. The same code works fine with std::unordered_map.

If I store keyA's value in a temp variable everything works as expected:

map_t::const_iterator iter = map.find(keyA);
tempVal = (*iter).second;
map[keyB] = tempVal;

(Unfortunately, I can't reproduce this issue with a simple test case, only our internal code.)

Using the temp variable is fine for my usage, just curious if what I'm seeing is a bug, or expected behaviour. I saw in the readme that calls to erase() can invalidate iterators. Is the same true with insertions as well?

greg7mdp commented 7 years ago

Hi there,

Thank you using sparsepp, and for for taking the time to report this issue. Indeed iterators may be invalidated when an item is inserted into sparsepp, and that is expected behavior, but I should have stated it clearly in the readme.md (which I will update).

Please note that inserting an item into a std::unordered_map may also invalidate iterators, albeit only if this causes a resize. As stated in the doc for std::unordered_map:

"If rehashing occurs due to the insertion, all iterators are invalidated. Otherwise iterators are not affected. References are not invalidated. Rehashing occurs only if the new number of elements is greater than max_load_factor()*bucket_count()."

Using the temp variable is a good solution. I think insert would also work as the parameters would be evaluated first:

map.insert({keyB, (*iter).second});

indigox4 commented 7 years ago

Ah ok, that makes sense. I was replacing unordered_map in some third-party code and it never occurred to me the existing code could be mis-using the STL.

Just by changing a few unordered_maps to sparse_hash_maps and fixing that insertion bug, I've managed to speed up our code by about 1.5x, and reduce our memory consumption by 20% as well, so thanks for that!

greg7mdp commented 7 years ago

Well, the invalidation of iterators when inserting will occur more often with sparsepp, so I certainly should mention it in the readme. Really glad to hear that you are happy with sparsepp. Do you use it on windows or linux?

indigox4 commented 7 years ago

Sounds good, thanks again.

We release our app on Windows, OSX, and Linux, but all of the performance testing I did was on Windows 7.

-------- Original message -------- From: Gregory Popovitch notifications@github.com Date: 2016-12-13 11:27 AM (GMT-05:00) To: greg7mdp/sparsepp sparsepp@noreply.github.com Cc: Stephen indigox3@hotmail.com, Author author@noreply.github.com Subject: Re: [greg7mdp/sparsepp] Possible iterator bug? (#19)

Well, the invalidation of iterators when inserting will occur more often with sparsepp, so I certainly should mention it in the readme. Really glad to hear that you are happy with sparsepp. Do you use it on windows or linux?

- You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/greg7mdp/sparsepp/issues/19#issuecomment-266786041, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ASm-Mv3eam3GkhbwDxWFJF1nyJKafZEFks5rHsdBgaJpZM4LK5gg.

greg7mdp commented 7 years ago

OK. Then you may want to watch for a new version in a couple weeks. I have noticed that on windows, the memory saving is not as good as it should be because the windows default allocator (malloc) fragments the heap and has a very high overhead... so I'm currently implementing a special allocator for sparsepp which will solve this issue. With that new version, the memory savings will be much higher on Windows, and on par with what is currently seen on linux.

mlmsft commented 7 years ago

Hi and thanks for sharing the code,

I wonder if there has been any progress on this:

I'm currently implementing a special allocator for sparsepp which will solve this issue. With that new version, the memory savings will be much higher on Windows, and on par with what is currently seen on linux

Thanks

Michael

greg7mdp commented 7 years ago

Working on it. Sorry it is taking longer than I had forecast, there are lots of little details to get right. The allocator is working great. I'm now integrating it with sparsepp but that probably will still take a couple weeks.

greg7mdp commented 7 years ago

@mlmsft , Just curious, are you using sparsepp already or is this for a future project?

mlmsft commented 7 years ago

We are experimenting with it as an alternative to unordered_map, which is a true memory hog. At the moment, we are observing about 20% improvement due to sparsepp, which is already good, but a bit behind original expectations. So,we are looking forward to see sparsepp Windows performance catching up with the Linux version. Incidentally, sparsepp is somewhat difficult to debug in VS, but it will be a small price to pay if it cuts memory requirements.

Thanks

Michael

greg7mdp commented 7 years ago

Great, I think you'll be very pleased with the memory usage with the custom allocator. Why do you say it is difficult to debug in VS? Is it because the code is somewhat cryptic and hard to follow, or is there a specific issue? I myself debug it in VS all the time.

mlmsft commented 7 years ago

The issue with debugging I was talking about is how VS displays hash contents. Attached is a snapshot showing contents of unordered_map and , below it, sparsepp: image In the first case, we see an associative array, in the second case, the view is a bit cryptic.

greg7mdp commented 7 years ago

Oh, thanks for explaining the issue, clearly this is not helpful at all. I'll look into creating a natvis file, so that the debugger can display sparsepp objects in a more useful fashion. Hopefully it wont be too difficult.

mlmsft commented 7 years ago

Terrific. Looking forward to seeing all those changes in the library. Thanks for being so responsive!

Michael

greg7mdp commented 7 years ago

Hi Michael,

I just added a spp.natvis file which allows user-friendly visualization of sparse_hash_map/set. I Have tested it with Visual Studio 2015, but it may work for 2012/2013.

mlmsft commented 7 years ago

Yes, visualization works exactly like the built-in implementation now. It might be a small addition, but probably very helpful for VS developers using your library. I have VS 2015, just like you do, so I can only confirm that it works there. Thank you!

Michael

greg7mdp commented 7 years ago

You are welcome!

mlmsft commented 7 years ago

Hi Greg,

Any updates on memory allocator for Windows?

Thanks

Michael

greg7mdp commented 7 years ago

I am still working on the memory allocator. One thing you could try is to download the file "malloc.c" from http://g.oswego.edu/dl/html/malloc.html, and just add it to your project. It is the excellent allocator by Doug Lea. I would be very curious to hear jow it impacts the memory usage of your app.

greg7mdp commented 7 years ago

@mlmsft , just checked in the version with the new allocator. It is used by default on Windows.Give it a try and let me know what you think!

mlmsft commented 7 years ago

Hi Greg, Thanks for checking in the update. I didn't get to see its performance yet, mostly due to compilation issues. Few things that caught my (and Visual Studio's) attention:

  1. You have

    define DEBUG 0

    in spp_dlalloc.h which is likely to collide with preprocessor definitions, especially because many people use #ifdef to see if they are in debug mode. Maybe give the macro a different name?

  2. There are many non-static functions that are also defined in this file. As a result, including the spp.h header file in several source files of a library/executable is problematic
  3. Unrelated to this, but in reference to a previous solved issue: how about expanding the natvis file to show not just the actual map but also associated iterators?

Best

Michael

greg7mdp commented 7 years ago

Michael, thank you for the quick feedback!

  1. yes, I'll fix the DEBUG.
  2. I'll also check the non-static functions...
  3. natvis for iterators => will do
greg7mdp commented 7 years ago

Hi @mlmsft , I just checked in a fix for the first two issues. Hopefully it will build for you now.