serge-sans-paille / frozen

a header-only, constexpr alternative to gperf for C++14 users
Apache License 2.0
1.28k stars 101 forks source link

Generate an error when duplicate keys are present #175

Open Shizcow opened 1 month ago

Shizcow commented 1 month ago

In my project, I have a large unordered_map that I use to store command line debug flags and attributes for those flags. I was performing some copy-paste programming to add a few new flags and accidentally created a map with duplicate keys -- an easy mistake. However, realizing that the duplicate value was present was extremely difficult because there's no tailored error message produced. Instead, frozen seems to happily keep trying new seeds infinitely, expecting the duplicate keys to eventually produce different hash values. This will either run over the constexpr limit or, if the limit is raised, cause GCC to consume all the memory on the system and die. Not a very helpful failure case for determining the root cause of a duplicate key issue.

Here's a minimal repro:

constexpr auto some_map =
  frozen::make_unordered_map<frozen::string, unsigned int>({
      {"a", 0},
      {"a", 1}
    });

This will cause an infinite loop in constexpr evaluation and lead to:

in ‘constexpr’ expansion of ‘d.frozen::bits::seed_or_index::value()’
error: ‘constexpr’ evaluation operation count exceeds limit of 33554432 (use ‘-fconstexpr-ops-limit=’ to increase the limit)
   17 |       });
      |        ^

Or if the limit is raised, eventually consuming all system memory and killing GCC.

Is is possible to trigger a more descriptive error message? Even if the exact keys can't be printed out in a diagnostic, something like:

static_assert(has_duplicates, "Your map has duplicate keys somewhere");

would be immensely helpful in pointing to a solution.

cbeck88 commented 1 month ago

When I worked on this years ago, I remember encountering the same issue, which makes it more annoying to use, but it wasn't completely straightforward to fix. Here's what I remember.

  1. One might want to view this as input validation, and have a separate routine that checks for duplicates. That's kind of nice, then there is separation of concerns. However, what are you going to do in that routine. You can't just build a hashmap in constexpr function, that's kind of the point of this crate. There are constexpr sort routines, but then you are committing frozen to only supporting sort-able keys I guess. And you probably don't want to just do a naive O(n^2) duplicate check because you don't want to have a scenario where the input validation takes longer than the hash table generation.
  2. If you don't want to do it as input validation, then you have to do it in the algorithm itself. Probably, what you need to do is change the algorithm so that when it's building the table, it first builds a version where the keys are stored with the values inside the map. Then you can detect duplicates and bail out. But then, you probably want to strip out keys and only store the values in the final result. That changes the type of the carray, so it's fairly finicky. I never got to the point of having a working patch -- a big part of working on this code was working through the compiler fight around constexpr.

For me this like 6 years ago, maybe the compilers are less finicky about constexpr nowadays. HTH