jcoglan / canopy

A parser compiler for Java, JavaScript, Python, Ruby
http://canopy.jcoglan.com
Mozilla Public License 2.0
417 stars 54 forks source link

A suggestion regarding %make node builders #47

Open patb2 opened 3 years ago

patb2 commented 3 years ago

First off, thanks a lot for sharing this development, that is the best I have found for this purpose, and the fact that it addresses various target environments is really awesome. I have a suggestion : it could be a good idea, and quite a simple one, to allow for a generic %make function, that would receive the node type as an argument. For the moment, we need to define as many %make_xxxx functions as there are node types to be handled, but all my %make_xxxx functions do exactly the same thing: they instanciate a custom Node object, the only difference being the node type. So it's like:

class Actions(object):

    def make_number(self, input, start, end, elements):
        return ParseNode('number', input, start, end, elements)

    def make_float(self, input, start, end, elements):
        return ParseNode('float', input, start, end, elements)

    def make_variable(self, input, start, end, elements):
        return ParseNode('variable', input, start, end, elements)

    def make_expression(self, input, start, end, elements):
        return ParseNode('expression', input, start, end, elements)

    # and so on, for about thirty node types.

I suspect it will be quite common to instanciate the same object type, with the node type as an argument. So if the node type was among the arguments of the %make functions, we could define a single %make function.

jcoglan commented 2 years ago

I think this shares some of the problems of #7 in that parsing expressions do not have names. A grammar binds rule names to parsing expressions, which are arbitrarily nested, like expressions in a programming language. The %action syntax is bound to these nested expressions, not to rule names. So parsing expressions and their results are really anonymous, there isn't a "type" that we could track and pass through to these functions.

It is possible in theory for us to assign a "type" to nodes based on the rule where they are initially created, where their parsing expression is. However I think this would end up being more confusing, because referencing conceptually "moves" nodes between rules. Some rules generate one of many different types via a choice like a / b / c -- should the nodes this expression returns retain their original type, or the type of the referencing rule? How should this affect optional nodes, or repeitions?

It would also make grammars harder to refactor; grammar rules are effectively function definitions, and imagine a programming language where every value had the name of the function that generated it attached to it. If you refactored by moving code between functions, or splitting functions into smaller units, you'd need to change how all their values are handled. So adding this "type" information strongly couples the grammar to the program that consumes it, in a way that I believe would cause more hassle than its worth.

In cases like yours, there will usually by logic somewhere that cases about the "type" you've assigned to each ParseNode and cause different behaviour. I think that logic should live in the parser by causing different types of object to be created, and the different behaviours should be attached to those objects -- the parser is the place you make these decisions.

jcoglan commented 2 years ago

As mentioned in https://github.com/jcoglan/canopy/issues/7#issuecomment-1084458260 I've added a limited form of rule name capture in order to improve error messages. I am still not sold on adding rule names to parse tree nodes, for the reasons mentioned there and on this thread. I will keep it under review for future releases, however.