martinus / robin-hood-hashing

Fast & memory efficient hashtable based on robin hood hashing for C++11/14/17/20
https://gitter.im/martinus/robin-hood-hashing
MIT License
1.52k stars 146 forks source link

Insert fails with std::overflow_error again #117

Closed slavenf closed 3 years ago

slavenf commented 3 years ago

Hello.

I have encountered std::overflow_error again.

I am using robin hood version 3.10.0.

This time I cannot create test case. I hope you can give me guidelines to find source of problem. I am working with robin_hood::unordered_flat_set<int32_t>. My application is working with 200 containers and 10 million elements in total, so I cannot find good way to create test example. It takes about 5 minutes of number crunching before application crashes while trying to insert that specific integer into that specific container.

My application is basically doing this:

Exception is not thrown if I recreate standalone test case with same elements.

Thanks.

martinus commented 3 years ago

Hi, could you give this prototype a try, which might improve the overflow_error issue: https://raw.githubusercontent.com/martinus/robin-hood-hashing/2021-03-rehash-on-overflow-experiment/src/include/robin_hood.h

slavenf commented 3 years ago

Unfortunately, problem still exists. Nothing changed.

I have enabled ROBIN_HOOD_LOG_ENABLED, ROBIN_HOOD_TRACE_ENABLED and ROBIN_HOOD_COUNT_ENABLED.

And this is log before program crashes:

...
...
...
insert@0x737: 0x558cc8716668
emplace@0x6ee: 0x558cc8716668
insert@0x737: 0x558cc8716668
emplace@0x6ee: 0x558cc8716668
insert@0x737: 0x558cc8716668
emplace@0x6ee: 0x558cc8716668
insert@0x737: 0x558cc8716668
emplace@0x6ee: 0x558cc8716668
insert@0x737: 0x558cc8716668
emplace@0x6ee: 0x558cc8716668
try_increase_info@0x93a: mInfoInc=0x2, numElements=0x4c0f, maxNumElementsAllowed=0xcccc
increase_size@0x962: mNumElements=0x4c0f, maxNumElementsAllowed=0xcccc, load=29.7104
ERROR: Process terminated due to unhandled exception. what(): robin_hood::map overflow
martinus commented 3 years ago

Thanks for trying. You didn't use any custom hash or std::hash, just plain robin_hood::unordered_flat_set<int32_t>?

I can't reproduce this so far. Can you give more details how the maps interact? are the numbers you insert more or less random int32_t values?

slavenf commented 3 years ago

I do not use any custom hash or std::hash. I am using just plain robin_hood::unordered_flat_set<int32_t>.

My program takes file as input data. Program process that data and outputs result into file. During processing program uses robin_hood sets. Output is different for different input file, so I can say that numbers are more or less random int32_t values. I am not using random number generator, of course. Problematic set crashes after numerous container::insert, container::erase and container::clear operations. I found out that set crashes only in this case when operations on container are applied in specific order. Unforutunately, I am unable to track order of operations on every container. I have a few more ideas to try. I will inform you.

mxncr commented 3 years ago

Hi, thanks for the great library. For information, we have the same issue when using robin_hood::unordered_flat_set<uint32_t> (robin_hood version 3.9.1) on MacOS (clang 12.0.0). The overflow exception happens rarely, but when it does it is quite a big issue for the larger program. I have not seen the problem triggered on Linux (gcc 10.2.0).

Update: on our data, the issue is fixed by the previous prototype (https://raw.githubusercontent.com/martinus/robin-hood-hashing/2021-03-rehash-on-overflow-experiment/src/include/robin_hood.h)

slavenf commented 3 years ago

I've got it!!!

I have created test example. Test program is in file robin-hood-test.cpp. Test data are in file test-data.txt. Test program automatically reads test data. Program crashes with roobin hood version 3.10.0. Everything works fine with version 3.9.1.

Program has map std::map<std::size_t, robin_hood::unordered_flat_set<int32_t>> sets. Key is id of set, and value is set.

Test data are written in format

$$$ <id> construct                      --> calls sets[id]
$$$ <id> move_construct <other_id>      --> calls sets[id] = std::move(sets[other_id])
$$$ <id> destroy                        --> calls sets.erase(id)
$$$ <id> insert <elem>                  --> calls sets[id].insert(elem)
$$$ <id> erase <elem>                   --> calls sets[id].erase(elem)
$$$ <id> clear                          --> calls sets[id].clear()

robin-hood-test.tar.gz

slavenf commented 3 years ago

Hi, could you give this prototype a try, which might improve the overflow_error issue: https://raw.githubusercontent.com/martinus/robin-hood-hashing/2021-03-rehash-on-overflow-experiment/src/include/robin_hood.h

Now I see that I forgot test that version in example above. I have just run with that version and example above does not fails, but my program does fail with same data. I tried to track all operations in my program but maybe I missed something.

slavenf commented 3 years ago

Me, again. Sorry for too much messages. I am too excited about this problem. I have attached new test data that fails with robin hood from this post:

Hi, could you give this prototype a try, which might improve the overflow_error issue: https://raw.githubusercontent.com/martinus/robin-hood-hashing/2021-03-rehash-on-overflow-experiment/src/include/robin_hood.h

You have to use my test example. test-data-new.txt.tar.gz

martinus commented 3 years ago

Thanks for that! I'll see if I can reproduce it.

martinus commented 3 years ago

~I can reproduce the issue, and am trying to get the minimal set of data for the reproducer. It seems that just the inserts into id 22 and id 26 are enough to reproduce this... so this is clearly a bug somewhere~

It seems to be just map 22, so this is yet another case of bad hashing luck. My attempt to improve this did not work, but thanks to your reproducer I have another improvement, could you try this one please: https://raw.githubusercontent.com/martinus/robin-hood-hashing/2021-03-rehash-on-overflow-experiment/src/include/robin_hood.h

slavenf commented 3 years ago

..... could you try this one please: https://raw.githubusercontent.com/martinus/robin-hood-hashing/2021-03-rehash-on-overflow-experiment/src/include/robin_hood.h

It does not work. I have attached new test data. I have also attached new program. It accepts input file as argument.

robin-hood-test.cpp.tar.gz test-data-3.txt.tar.gz

martinus commented 3 years ago

Thanks again for the data! This makes it very easy for me to reproduce. Here is another update which works for me with both your test data: https://raw.githubusercontent.com/martinus/robin-hood-hashing/2021-03-rehash-on-overflow-experiment/src/include/robin_hood.h

slavenf commented 3 years ago

Thanks again for the data! This makes it very easy for me to reproduce. Here is another update which works for me with both your test data: https://raw.githubusercontent.com/martinus/robin-hood-hashing/2021-03-rehash-on-overflow-experiment/src/include/robin_hood.h

Thank you for quick response. That version works without problems. I will keep testing with that version and inform you if exception happens again.

And thank you for this great library. I like it because of speed and low memory consumption.

martinus commented 3 years ago

@slavenf did it work for you so far? If it does, I'd like to create a new release.

slavenf commented 3 years ago

Hello. I have been running test cases since last week. It works without problems.

martinus commented 3 years ago

That's great, thanks!

lars-bittrich commented 3 years ago

Hello, I just stumbled onto that same issue and thus found this discussion here. Before that I also prepared a small example file. For me the issue of the std::overflow_error is also fixed with the latest commit.

However, I noticed a very serious performance regression with this new patch if I compare to std::unordered_set of a factor of 5 or so in my example. I also tested the example under macos (Apple clang version 12.0.0 (clang-1200.0.32.29)) and windows (MSVC 19.28.29337). The previous overflow error was only observable with Apple clang.

Here is the code:

#include "robin_hood.h"
#include <iostream>
#include <unordered_set>

int main() {
  size_t N=100000, Nsmall=10000;
  size_t count=0;

  //std::unordered_set<size_t> tasks;  // this works always
  robin_hood::unordered_flat_set<size_t> tasks;
  //tasks.reserve(N); //robin_hood set works with this

  tasks.insert(0);
  while(tasks.size()>0) {
    size_t index = *tasks.begin();
    tasks.erase(tasks.begin());

    if ((count<N) && (tasks.size()<Nsmall)) {
      tasks.insert(count);
      ++count;
      tasks.insert(count);
      ++count;
    }

  }
  std::cout << "Tasks done" << std::endl;

  return 0;
}
martinus commented 3 years ago

@lars-bittrich, that benchmark looks too simplistic. Inserting consecutive numbers is not something I optimize for, in your benchmark simple deque or a vector based circular buffer would be sufficient and much faster. (I've edited your comment to be better formatted)

lars-bittrich commented 3 years ago

@martinus Thanks for your answer! You are correct of cause. This code in itself does not make sense and initially it was not intended as a benchmark but to show the overflow_error. My original use case had unpredictable numbers which were more like numbers drawn randomly from a finite set and I had no upper limit of the task set size. The upper limit was only needed to provoke the crash. However, I quickly checked with random numbers between 1 and N instead of "count" in the set inserts and basically got the same result in terms of speed difference. But I definitely do not claim it to be a perfect benchmark case. So thanks for your great work with the library in any case!

jmbnyc commented 3 years ago

I have run into a a similar overflow issue and I think the current behavior is unacceptable, e.g. throw an exception. I would like to understand if you have any thoughts on how to avoid "overflow", e.g. keep trying or is the nature of the code such that it can't be avoided? If it can't be avoided then I'll not be able to use this map (which is a shame because it is quite good compared to other hash maps wrt size and speed). In my case, the keys are all strings. I would try to create a repro but it failed in a production system and there is no log of the input values.

martinus commented 3 years ago

If you have a bad hash function, say something like this:

struct hash {
    size_t operator()(Whatever const& x) const {
        return 0;
    }
};

This currently can't be avoided. One workaround would be if that happens to switch the 1-byte info byte to use more bytes, but this would be very significant update that I unfortunately currently don't have the time for.

jmbnyc commented 3 years ago

Martin, I don't have a bad hash function and because of the overflow I had to stop using the map in favor of the tsl robin map which does not have any overflow issues.

martinus commented 3 years ago

do you remember which version you tried? Since release 3.11.1 the overflow issue should be greatly improved

jmbnyc commented 3 years ago

3.11.1

slavenf commented 3 years ago

@martinus I would like to say that I have not encountered any error since last time but I am afraid to use robin_hood in production version. @jmbnyc Do you still have problems? Current version is 3.11.3. Did you try that version?

martinus commented 3 years ago

Please try the latest release