hanickadot / compile-time-regular-expressions

Compile Time Regular Expression in C++
https://twitter.com/hankadusikova
Apache License 2.0
3.22k stars 177 forks source link

`[^\w\W]+` does not compile #284

Open iulian-rusu opened 1 year ago

iulian-rusu commented 1 year ago

I was trying to make a regex that does not match anything using the [^\w\W] syntax. It worked by itself, but failed to compile once I used a greedy quantifier on it.

I played around some more and the issue appears when using a negated character class like \W or \D inside a negated set that is quantified with a greedy operator. Lazy and possessive versions compile just fine.

Godbolt: https://godbolt.org/z/edKfYeqo9

Here's the error:

In file included from <source>:1:
/app/raw.githubusercontent.com/hanickadot/compile-time-regular-expressions/main/single-header/ctre.hpp:4152:10: error: assigning to 'int64_t' (aka 'long') from incompatible type 'void'
        start = negative_helper(Head{}, cb, start);
                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/app/raw.githubusercontent.com/hanickadot/compile-time-regular-expressions/main/single-header/ctre.hpp:4153:2: note: in instantiation of function template specialization 'ctre::negative_helper<ctre::negative_set<ctre::set<ctre::char_range<'A', 'Z'>, ctre::char_range<'a', 'z'>, ctre::char_range<'0', '9'>, ctre::character<'_'>>>, (lambda at /app/raw.githubusercontent.com/hanickadot/compile-time-regular-expressions/main/single-header/ctre.hpp:4294:25)>' requested here
        negative_helper(ctre::negative_set<Rest...>{}, std::forward<CB>(cb), start);
        ^
/app/raw.githubusercontent.com/hanickadot/compile-time-regular-expressions/main/single-header/ctre.hpp:4294:3: note: in instantiation of function template specialization 'ctre::negative_helper<ctre::set<ctre::set<ctre::char_range<'A', 'Z'>, ctre::char_range<'a', 'z'>, ctre::char_range<'0', '9'>, ctre::character<'_'>>>, ctre::negative_set<ctre::set<ctre::char_range<'A', 'Z'>, ctre::char_range<'a', 'z'>, ctre::char_range<'0', '9'>, ctre::character<'_'>>>, (lambda at /app/raw.githubusercontent.com/hanickadot/compile-time-regular-expressions/main/single-header/ctre.hpp:4294:25)>' requested here
                negative_helper(nset, [&](int64_t low, int64_t high){
                ^
/app/raw.githubusercontent.com/hanickadot/compile-time-regular-expressions/main/single-header/ctre.hpp:4302:4: note: in instantiation of function template specialization 'ctre::point_set<10>::populate<ctre::set<ctre::set<ctre::char_range<'A', 'Z'>, ctre::char_range<'a', 'z'>, ctre::char_range<'0', '9'>, ctre::character<'_'>>>, ctre::negative_set<ctre::set<ctre::char_range<'A', 'Z'>, ctre::char_range<'a', 'z'>, ctre::char_range<'0', '9'>, ctre::character<'_'>>>>' requested here
                (populate(Content{}), ...);
                 ^
/app/raw.githubusercontent.com/hanickadot/compile-time-regular-expressions/main/single-header/ctre.hpp:4310:6: note: in instantiation of function template specialization 'ctre::point_set<10>::populate<ctre::negative_set<ctre::set<ctre::set<ctre::char_range<'A', 'Z'>, ctre::char_range<'a', 'z'>, ctre::char_range<'0', '9'>, ctre::character<'_'>>>, ctre::negative_set<ctre::set<ctre::char_range<'A', 'Z'>, ctre::char_range<'a', 'z'>, ctre::char_range<'0', '9'>, ctre::character<'_'>>>>>' requested here
        set.populate(rhs);
            ^
/app/raw.githubusercontent.com/hanickadot/compile-time-regular-expressions/main/single-header/ctre.hpp:4700:22: note: in instantiation of function template specialization 'ctre::collides<ctre::negative_set<ctre::set<ctre::set<ctre::char_range<'A', 'Z'>, ctre::char_range<'a', 'z'>, ctre::char_range<'0', '9'>, ctre::character<'_'>>>, ctre::negative_set<ctre::set<ctre::char_range<'A', 'Z'>, ctre::char_range<'a', 'z'>, ctre::char_range<'0', '9'>, ctre::character<'_'>>>>>' requested here
        else if constexpr (!collides(calculate_first(Content{}...), calculate_first(Tail{}...))) {
                            ^
/app/raw.githubusercontent.com/hanickadot/compile-time-regular-expressions/main/single-header/ctre.hpp:4372:9: note: (skipping 2 contexts in backtrace; use -ftemplate-backtrace-limit=0 to see all)
        return evaluate(begin, current, last, f, captures.set_start_mark(current), ctll::list<Tail...>());
               ^
/app/raw.githubusercontent.com/hanickadot/compile-time-regular-expressions/main/single-header/ctre.hpp:5273:10: note: in instantiation of function template specialization 'ctre::match_method::exec<ctll::list<ctre::singleline>, void, ctre::repeat<1, 0, ctre::negative_set<ctre::set<ctre::set<ctre::char_range<'A', 'Z'>, ctre::char_range<'a', 'z'>, ctre::char_range<'0', '9'>, ctre::character<'_'>>>, ctre::negative_set<ctre::set<ctre::char_range<'A', 'Z'>, ctre::char_range<'a', 'z'>, ctre::char_range<'0', '9'>, ctre::character<'_'>>>>>, const char *, const char *>' requested here
                return exec<Modifier, ResultIterator>(begin, begin, end, RE{});
                       ^
/app/raw.githubusercontent.com/hanickadot/compile-time-regular-expressions/main/single-header/ctre.hpp:5373:27: note: in instantiation of function template specialization 'ctre::match_method::exec<ctll::list<ctre::singleline>, void, ctre::repeat<1, 0, ctre::negative_set<ctre::set<ctre::set<ctre::char_range<'A', 'Z'>, ctre::char_range<'a', 'z'>, ctre::char_range<'0', '9'>, ctre::character<'_'>>>, ctre::negative_set<ctre::set<ctre::char_range<'A', 'Z'>, ctre::char_range<'a', 'z'>, ctre::char_range<'0', '9'>, ctre::character<'_'>>>>>, const char *, const char *>' requested here
                return Method::template exec<Modifier>(begin, end, RE{});
                                        ^
/app/raw.githubusercontent.com/hanickadot/compile-time-regular-expressions/main/single-header/ctre.hpp:5382:10: note: in instantiation of function template specialization 'ctre::regular_expression<ctre::repeat<1, 0, ctre::negative_set<ctre::set<ctre::set<ctre::char_range<'A', 'Z'>, ctre::char_range<'a', 'z'>, ctre::char_range<'0', '9'>, ctre::character<'_'>>>, ctre::negative_set<ctre::set<ctre::char_range<'A', 'Z'>, ctre::char_range<'a', 'z'>, ctre::char_range<'0', '9'>, ctre::character<'_'>>>>>, ctre::match_method, ctll::list<ctre::singleline>>::exec<const char *, const char *>' requested here
                return exec(sv.begin(), sv.end());
                       ^
/app/raw.githubusercontent.com/hanickadot/compile-time-regular-expressions/main/single-header/ctre.hpp:5404:10: note: in instantiation of member function 'ctre::regular_expression<ctre::repeat<1, 0, ctre::negative_set<ctre::set<ctre::set<ctre::char_range<'A', 'Z'>, ctre::char_range<'a', 'z'>, ctre::char_range<'0', '9'>, ctre::character<'_'>>>, ctre::negative_set<ctre::set<ctre::char_range<'A', 'Z'>, ctre::char_range<'a', 'z'>, ctre::char_range<'0', '9'>, ctre::character<'_'>>>>>, ctre::match_method, ctll::list<ctre::singleline>>::exec' requested here
                return exec(std::forward<Args>(args)...);
                       ^
<source>:4:38: note: in instantiation of function template specialization 'ctre::regular_expression<ctre::repeat<1, 0, ctre::negative_set<ctre::set<ctre::set<ctre::char_range<'A', 'Z'>, ctre::char_range<'a', 'z'>, ctre::char_range<'0', '9'>, ctre::character<'_'>>>, ctre::negative_set<ctre::set<ctre::char_range<'A', 'Z'>, ctre::char_range<'a', 'z'>, ctre::char_range<'0', '9'>, ctre::character<'_'>>>>>, ctre::match_method, ctll::list<ctre::singleline>>::operator()<std::basic_string_view<char> &>' requested here
    return ctre::match<R"([^\w\W]+)">(s);
                                     ^
In file included from <source>:1:
/app/raw.githubusercontent.com/hanickadot/compile-time-regular-expressions/main/single-header/ctre.hpp:4700:21: error: constexpr if condition is not a constant expression
        else if constexpr (!collides(calculate_first(Content{}...), calculate_first(Tail{}...))) {
                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/app/raw.githubusercontent.com/hanickadot/compile-time-regular-expressions/main/single-header/ctre.hpp:4372:9: note: in instantiation of function template specialization 'ctre::evaluate<ctre::regex_results<const char *>, const char *, const char *, const char *, 1UL, 0UL, ctre::negative_set<ctre::set<ctre::set<ctre::char_range<'A', 'Z'>, ctre::char_range<'a', 'z'>, ctre::char_range<'0', '9'>, ctre::character<'_'>>>, ctre::negative_set<ctre::set<ctre::char_range<'A', 'Z'>, ctre::char_range<'a', 'z'>, ctre::char_range<'0', '9'>, ctre::character<'_'>>>>, ctre::assert_subject_end, ctre::end_mark, ctre::accept>' requested here
        return evaluate(begin, current, last, f, captures.set_start_mark(current), ctll::list<Tail...>());
               ^
/app/raw.githubusercontent.com/hanickadot/compile-time-regular-expressions/main/single-header/ctre.hpp:5269:10: note: in instantiation of function template specialization 'ctre::evaluate<ctre::regex_results<const char *>, const char *, const char *, const char *, ctre::repeat<1, 0, ctre::negative_set<ctre::set<ctre::set<ctre::char_range<'A', 'Z'>, ctre::char_range<'a', 'z'>, ctre::char_range<'0', '9'>, ctre::character<'_'>>>, ctre::negative_set<ctre::set<ctre::char_range<'A', 'Z'>, ctre::char_range<'a', 'z'>, ctre::char_range<'0', '9'>, ctre::character<'_'>>>>>, ctre::assert_subject_end, ctre::end_mark, ctre::accept>' requested here
                return evaluate(orig_begin, begin, end, Modifier{}, return_type<result_iterator, RE>{}, ctll::list<start_mark, RE, assert_subject_end, end_mark, accept>());
                       ^
/app/raw.githubusercontent.com/hanickadot/compile-time-regular-expressions/main/single-header/ctre.hpp:5273:10: note: in instantiation of function template specialization 'ctre::match_method::exec<ctll::list<ctre::singleline>, void, ctre::repeat<1, 0, ctre::negative_set<ctre::set<ctre::set<ctre::char_range<'A', 'Z'>, ctre::char_range<'a', 'z'>, ctre::char_range<'0', '9'>, ctre::character<'_'>>>, ctre::negative_set<ctre::set<ctre::char_range<'A', 'Z'>, ctre::char_range<'a', 'z'>, ctre::char_range<'0', '9'>, ctre::character<'_'>>>>>, const char *, const char *>' requested here
                return exec<Modifier, ResultIterator>(begin, begin, end, RE{});
                       ^
/app/raw.githubusercontent.com/hanickadot/compile-time-regular-expressions/main/single-header/ctre.hpp:5373:27: note: in instantiation of function template specialization 'ctre::match_method::exec<ctll::list<ctre::singleline>, void, ctre::repeat<1, 0, ctre::negative_set<ctre::set<ctre::set<ctre::char_range<'A', 'Z'>, ctre::char_range<'a', 'z'>, ctre::char_range<'0', '9'>, ctre::character<'_'>>>, ctre::negative_set<ctre::set<ctre::char_range<'A', 'Z'>, ctre::char_range<'a', 'z'>, ctre::char_range<'0', '9'>, ctre::character<'_'>>>>>, const char *, const char *>' requested here
                return Method::template exec<Modifier>(begin, end, RE{});
                                        ^
/app/raw.githubusercontent.com/hanickadot/compile-time-regular-expressions/main/single-header/ctre.hpp:5382:10: note: in instantiation of function template specialization 'ctre::regular_expression<ctre::repeat<1, 0, ctre::negative_set<ctre::set<ctre::set<ctre::char_range<'A', 'Z'>, ctre::char_range<'a', 'z'>, ctre::char_range<'0', '9'>, ctre::character<'_'>>>, ctre::negative_set<ctre::set<ctre::char_range<'A', 'Z'>, ctre::char_range<'a', 'z'>, ctre::char_range<'0', '9'>, ctre::character<'_'>>>>>, ctre::match_method, ctll::list<ctre::singleline>>::exec<const char *, const char *>' requested here
                return exec(sv.begin(), sv.end());
                       ^
/app/raw.githubusercontent.com/hanickadot/compile-time-regular-expressions/main/single-header/ctre.hpp:5404:10: note: in instantiation of member function 'ctre::regular_expression<ctre::repeat<1, 0, ctre::negative_set<ctre::set<ctre::set<ctre::char_range<'A', 'Z'>, ctre::char_range<'a', 'z'>, ctre::char_range<'0', '9'>, ctre::character<'_'>>>, ctre::negative_set<ctre::set<ctre::char_range<'A', 'Z'>, ctre::char_range<'a', 'z'>, ctre::char_range<'0', '9'>, ctre::character<'_'>>>>>, ctre::match_method, ctll::list<ctre::singleline>>::exec' requested here
                return exec(std::forward<Args>(args)...);
                       ^
<source>:4:38: note: in instantiation of function template specialization 'ctre::regular_expression<ctre::repeat<1, 0, ctre::negative_set<ctre::set<ctre::set<ctre::char_range<'A', 'Z'>, ctre::char_range<'a', 'z'>, ctre::char_range<'0', '9'>, ctre::character<'_'>>>, ctre::negative_set<ctre::set<ctre::char_range<'A', 'Z'>, ctre::char_range<'a', 'z'>, ctre::char_range<'0', '9'>, ctre::character<'_'>>>>>, ctre::match_method, ctll::list<ctre::singleline>>::operator()<std::basic_string_view<char> &>' requested here
    return ctre::match<R"([^\w\W]+)">(s);
                                     ^
/app/raw.githubusercontent.com/hanickadot/compile-time-regular-expressions/main/single-header/ctre.hpp:4153:2: note: subexpression not valid in a constant expression
        negative_helper(ctre::negative_set<Rest...>{}, std::forward<CB>(cb), start);
        ^
/app/raw.githubusercontent.com/hanickadot/compile-time-regular-expressions/main/single-header/ctre.hpp:4294:3: note: in call to 'negative_helper({}, [&](int64_t low, int64_t high) {
    this->insert(low, high);
}, 96)'
                negative_helper(nset, [&](int64_t low, int64_t high){
                ^
/app/raw.githubusercontent.com/hanickadot/compile-time-regular-expressions/main/single-header/ctre.hpp:4302:4: note: in call to '&set->populate({})'
                (populate(Content{}), ...);
                 ^
/app/raw.githubusercontent.com/hanickadot/compile-time-regular-expressions/main/single-header/ctre.hpp:4310:6: note: in call to '&set->populate({})'
        set.populate(rhs);
            ^
/app/raw.githubusercontent.com/hanickadot/compile-time-regular-expressions/main/single-header/ctre.hpp:4700:22: note: in call to 'collides({}, {})'
        else if constexpr (!collides(calculate_first(Content{}...), calculate_first(Tail{}...))) {
                            ^
2 errors generated.
Compiler returned: 1
hiteshbedre commented 1 year ago

Try: [^\\w\\W] else use regex escaping library.

iulian-rusu commented 1 year ago

I'm already using raw C++ string literals (see attached Godbolt link), the regex is parsed correctly as can be seen in the error message