microsoft / STL

MSVC's implementation of the C++ Standard Library.
Other
10.2k stars 1.51k forks source link

<regex>: std::regex uses a nonstandard regex constant "error_syntax" #438

Open BillyONeal opened 4 years ago

BillyONeal commented 4 years ago

Describe the bug The standard has a set of std::regex_constants values defined for regex errors, but doesn't define a value for every possible kind of error; for example, an empty pattern that only contains lookbehind. We probably need to add the missing constant to the standard somehow.

Boost uses error_bad_pattern for this case.

Command-line test case STL version (git commit or Visual Studio version): Visual Studio 2019 version 16.4

C:\Users\bion\Desktop>type test.cpp
#include <regex>
#include <stdio.h>

int main() {
  try {
    std::regex r(R"((?<=abc))");
    puts("Parsing regex succeeded.");
  } catch (const std::regex_error &r) {
    switch (r.code()) {
    case std::regex_constants::error_collate:
    case std::regex_constants::error_ctype:
    case std::regex_constants::error_escape:
    case std::regex_constants::error_backref:
    case std::regex_constants::error_brack:
    case std::regex_constants::error_paren:
    case std::regex_constants::error_brace:
    case std::regex_constants::error_badbrace:
    case std::regex_constants::error_range:
    case std::regex_constants::error_space:
    case std::regex_constants::error_badrepeat:
    case std::regex_constants::error_complexity:
    case std::regex_constants::error_stack:
      puts("Got a standard regex error code.");
      break;
    default:
      puts("Got a nonstandard regex error code.");
      break;
    }

    puts(r.what());
  }
}

C:\Users\bion\Desktop>cl /EHsc /W4 /WX /std:c++latest /nologo .\test.cpp
test.cpp

C:\Users\bion\Desktop>.\test.exe
Got a nonstandard regex error code.
regex_error(error_syntax)

C:\Users\bion\Desktop>

Expected behavior The standard should describe every possible value regex_error::code() can return.

This is a dual of Microsoft-internal VSO-173840 / AB#173840.

vNext note: Resolving this issue will require breaking binary compatibility. We won't be able to accept pull requests for this issue until the vNext branch is available. See #169 for more information.

StephanTLavavej commented 1 year ago

We checked libstdc++ and libc++'s behavior, and it's as Tolkien said: "go not to the STLs for advice, for they will tell you error_syntax and error_paren and error_badrepeat" :elf: : https://godbolt.org/z/nvPz5rnaf

There should definitely be an LWG issue for this, ideally saying exactly which error code should be thrown (or [re] should be dropped entirely :smirk_cat: :unicorn: :joy_cat:).

StephanTLavavej commented 1 year ago

Removing vNext because we believe that although this would be a behavioral change, it wouldn't break ABI (it's worth noting that we've gotten more aggressive about behavioral changes since the beginning of the GitHub era). Of course, we would need to preserve the values of all existing error code constants.

Changing the error code of the exception being thrown, and/or removing the identifier error_syntax, could be done without realistically disrupting any code (it is unlikely that code is out there that is depending on the exact error code value being thrown for bogus syntax, or mentioning our non-Standard undocumented error constant).