textX / Arpeggio

Parser interpreter based on PEG grammars written in Python http://textx.github.io/Arpeggio/
Other
268 stars 55 forks source link

Question to the possible design of the reporting functionality #103

Closed stanislaw closed 1 year ago

stanislaw commented 1 year ago

Hello again,

This issue is my attempt to build on top of the previous discussion here. If a GitHub issue is not the right way to discuss these things, would it make sense to maybe activate GitHub Discussions, where more general things could be discussed?

I think that having a hierarchy of NoMatch classes is a good idea. As you already mentioned, having tuples in return values and using isinstance is not very elegant but I don't have a better idea to suggest at the moment.

One idea I had was to wrap the parsing result into a class that carries both result and the "weakly failed rules".

Examples of weakly failed rules producers are at least Optional, Sequence, OrderedChoice.

class ParseResult:
    def __init__(self, result: Optional, weakly_failed_rules: Optional[List]):
        self.result: Optional = result
        self.weakly_failed_rules: Optional[List] = weakly_failed_rules

Example: The Optional would then return a ParseResult(None, [<failed match expression>]).

How ParseResult would help? Currently, in the master branch, the failed expressions are cached ad-hoc in the method _nm_raise. The problem with that approach is that not all cases are caught in the nm.rules, and therefore the current reporting is rather limited. The code that calls: parse_result: ParseResult = self._parse() would then make a check if there are any weakly_failed_rules and collect them in something like parser.weakly_failed_rules. The weakly_failed_rules should be reset when there is an subsequent expression that parses successfully. If a parser fails, the Expected ... diagnostics should report on both weakly_failed_rules and the final failed expression.

I can currently not think of a better way of doing this, if the migration to the composite NoMatch design is to be made.

The advantage of maintaining ParseResult with weakly_failed_rules would be that the results will be very explicit, without limitations of the reentrant nm_raise.

The disadvantage is that not all ParsingExpressions are producers of weakly failed rules, so there will be redundancy in checking.

I would be curious to hear your thoughts on this.

I'm also wondering how big of a performance impact it might be. I guess you can continue in that direction and we'll figure out something along the way.

I don't think the performance impact with composite NoMatch/rules is much bigger than it was. An important point is that the creation of these composite classes happens only during the failure reporting time, and the positive course of the implementation is almost unmodified. The ParseResult code would introduce a need in extra if checks, so that might be a more serious contributor. I think we have to weigh the benefits of improved reporting versus a few extra lines of code.


There is one other thing that I noticed, and it looks like a little workaround (a hack?) around the limitations of current result = self._parse(parser) interface.

OrderedChoice:

                    match = True
                    result = [result]
                    break

and then:

            result = self._parse(parser)
            if self.suppress or (type(result) is list and
                                 result and result[0] is None):
                result = None

It looks like you are using somewhat a hack to handle a specific case by introducing a custom result type. Could this hack be also somehow replaced by a field in a more explicit ParseResult class?