peter-winter / ctpg

Compile Time Parser Generator is a C++ single header library which takes a language description as a C++ code and turns it into a LR1 table parser with a deterministic finite automaton lexical analyzer, all in compile time.
MIT License
456 stars 24 forks source link

Maybe it is possible that removing lexer scanner? #53

Open 95833 opened 1 year ago

95833 commented 1 year ago

I am writing a grammar using another parser library. i find lexer-scanner is unnatural. when we define a token , we usually give it a name with some semantics such as VARIABLE, STRING, INT, FLOAT, BOOL etc , this is unnatural because the lexer should not carry any infomation about semantics. maybe it is more suitable that using LITTTLE_CHAR_SET, CHARS_SET_WITH_QUOTES, DIGIT_SET replace VARIABLE, STRING, INT, but obviously, these name are too verbose. it seems unimportant, but when i define a grammar, i always need make a tradeoff between an natural but complex grammar and a simple but incoherent grammar, because the place using same token often have different semantics.

So, i consider whether we can get a nature grammar definition by removing lexer-scanner and replacing lexer-token with inline regex. At the same time, i think of your lib and i feel it is suitable to your lib becase it is able to complement the problem about lexer priority.

peter-winter commented 1 year ago

The problem is that the parser is supposed to be a constexpr object. This is the whole idea behind the library.

Now there are some problems:

For the char_term and 'string_term' it is easy, for the regex term I found a way but in c++20 standard:

template<std::size_t N>
struct regex
{
    constexpr regex(const char (&str)[N])
    {
        std::ranges::copy(str, array);
    }

    char array[N];
};

template<regex a>
constexpr auto operator ""_r()
{
    return a;
}

int main()
{
    constexpr auto expr = "[0-9]"_r;
    return 0;
}

Of course I could allow inlining them like this: regex_term("[0-9]"), but this seemed to verbose and the grammar looked ugly.

95833 commented 1 year ago

the target of inline is to solve the priority of matching lexer along with the process of syntax parsed. And i don't know whether or not it can realized and how to realize it. whereas the style of writing is not very important.