peter-winter / ctpg

Compile Time Parser Generator is a C++ single header library which takes a language description as a C++ code and turns it into a LR1 table parser with a deterministic finite automaton lexical analyzer, all in compile time.
MIT License
456 stars 24 forks source link

Fix: iterator is extended over end of string if no match occurs with `dfa_match` #77

Open moritz-geier opened 5 months ago

moritz-geier commented 5 months ago

Hello,

I got an error on Windows 11 and MSVC 19.39 where the parser fails. This happens if the dfa_match returns uninitialized16 and we try to add it to the iterator. Thus here my proposed fix.

To test the behavior create the following parser

    ///////////////////////////////////////////////////////////////////////////
    static constexpr ctpg::nterm<bool> conditionExpression{"condition"};
    static constexpr ctpg::char_term trueLiteral{'t'};
    static constexpr ctpg::char_term falseLiteral{'f'};

    ///////////////////////////////////////////////////////////////////////////
    static constexpr auto root = conditionExpression;

    ///////////////////////////////////////////////////////////////////////////
    static constexpr auto terminals = ctpg::terms(
        trueLiteral,
        falseLiteral
    );

    ///////////////////////////////////////////////////////////////////////////
    static constexpr auto nonTerminals = ctpg::nterms(
        conditionExpression
    );

    ///////////////////////////////////////////////////////////////////////////
    static constexpr ctpg::parser parser{
        root,
        terminals,
        nonTerminals,
        rules(
            conditionExpression(trueLiteral) >=
                [](const auto&) { return true; },
            conditionExpression(falseLiteral) >=
                [](const auto&) { return false; }
        )
    };

and try to parse

parser.parse(ctpg::buffers::cstring_buffer{"tr"});
moritz-geier commented 5 months ago

This is a dirty fix, in my opinion we should use std::distance within dfa_match to initiate rt.len , thus eliminating the need for the ifs afer calling the function. This would mean we need to enforce the iterators of each buffer to provide a operator- implementation.

peter-winter commented 5 months ago

First of all I am reluctant to commit any new work until I merge my term groups feature into master. The only thing left to do (as always) is msvc compilation, which I just don't have time to do at the moment.

I would gladly accept help regarding this topic.

The feature is very nice itself, it allows much smaller parsers and compile times when grouping similar terminal symbols. C language for instance has 11 assignment operators which from parser point of view are the same.

Second thing, after I'm done with the grouped terms feature, we can think of a better solution (you already mentioned a nice one) not this dirty one.

Would you be willing to work on making the term-groups branch work on msvc?

moritz-geier commented 5 months ago

Yes I can try and get it working.