Maybe it is possible that removing lexer scanner?

95833 commented 1 year ago

I am writing a grammar using another parser library. i find lexer-scanner is unnatural. when we define a token , we usually give it a name with some semantics such as VARIABLE, STRING, INT, FLOAT, BOOL etc , this is unnatural because the lexer should not carry any infomation about semantics. maybe it is more suitable that using LITTTLE_CHAR_SET, CHARS_SET_WITH_QUOTES, DIGIT_SET replace VARIABLE, STRING, INT, but obviously, these name are too verbose. it seems unimportant, but when i define a grammar, i always need make a tradeoff between an natural but complex grammar and a simple but incoherent grammar, because the place using same token often have different semantics.

So, i consider whether we can get a nature grammar definition by removing lexer-scanner and replacing lexer-token with inline regex. At the same time, i think of your lib and i feel it is suitable to your lib becase it is able to complement the problem about lexer priority.

peter-winter commented 1 year ago

The problem is that the parser is supposed to be a constexpr object. This is the whole idea behind the library.

Now there are some problems:

I need to calculate the size of a finite automaton table to construct a lexer, so...
I need all of the sizes of regexes in compile time
I would like to allow inline terms but only if they are expressed as literals, like say "[0-9]"_r

For the char_term and 'string_term' it is easy, for the regex term I found a way but in c++20 standard:

template<std::size_t N>
struct regex
{
    constexpr regex(const char (&str)[N])
    {
        std::ranges::copy(str, array);
    }

    char array[N];
};

template<regex a>
constexpr auto operator ""_r()
{
    return a;
}

int main()
{
    constexpr auto expr = "[0-9]"_r;
    return 0;
}

Of course I could allow inlining them like this: regex_term("[0-9]"), but this seemed to verbose and the grammar looked ugly.

95833 commented 1 year ago

the target of inline is to solve the priority of matching lexer along with the process of syntax parsed. And i don't know whether or not it can realized and how to realize it. whereas the style of writing is not very important.

peter-winter / ctpg

Maybe it is possible that removing lexer scanner? #53