vlasovskikh / funcparserlib

Recursive descent parsing library for Python based on functional combinators
https://funcparserlib.pirx.ru
MIT License
338 stars 38 forks source link

Force define() method to set a lazy evaluator in parser object. #42

Closed magniff closed 8 years ago

magniff commented 8 years ago

The problem is straight forward as hell - if you call parser0.define(parser1) and parser1 haven't been defined yet, you just copy its dummy run method realization, I think this is not the way it meant to be played. Following little fix adds more laziness to evaluation cascade.

magniff commented 8 years ago

@vlasovskikh validate please)

vlasovskikh commented 8 years ago

Why do you need to define a forward declaration parser using another forward declaration, not a real parser?

magniff commented 8 years ago

@vlasovskikh It actually happens when your grammar has loops, like this one. Have a look at location and expr non terminals - they depend on each other. It is not a big deal if you are implementing parser as a single class or module, but it kinda IS if your parser's components decomposed and spread all over the project i.e. multiple packages. If so, sooner or later you will get a module dependency loop. Surely language designer (MIT professor in this case) could rework grammar to avoid stuff like that, but sometimes (and it actually happens) grammar rules are not known at compile time - say if you are using some import hooks magic or meta programming. It turns out that if you split parser's declaration from parser's implementation it would help. Lets say for each non terminal we have a separate package, just add declaration module in it, that would contain parser declaration like this p_location = funcparserlib.parser.forward_decl(). Taking this approach we would be able to reuse this p_location object without import looping, have a look at sample project structure:

parsers
  __init__.py
  --- location
         --- parser.py               #  defines p_location and imports expression.declaration.p_expression
         --- declaration.py          #  declares p_location
  --- expression
         --- parser.py               # defines p_expression and imports location.declaration.p_location
         --- declaration.py          # declares p_expression

Now to define p_location at location.parser we can do from .declaration import p_location. To use declaration of p_location in expression.parser just from ..location.declaration import p_location. And the last thing - to make sure that for the external user parsers package would be built correctly, we should add to parsers.__init__ something like

from .location.parser import p_location
from .expression.parser import p_expression

Code above actually guaranties invocation of 'definition code' for each parser.

vlasovskikh commented 8 years ago

Having recursive definitions spread among several modules is usually a bad practice. I would rather keep the forward declaration parser in the library simple.

Consider monkey-patching the forward declaration parser or switching to a single module structure for your grammar.

arefiev commented 8 years ago

The strongest point of recursive-descent parsing is that it allows composability unattainable with classic parser generators, and being able to split anything into multiple modules once your single module becomes large enough is definitely a huge plus. Not having non-eager forward decls prevents that in some cases, such as the one described by @magniff above. Are there any cons to this approach, other than possible weird AttributeErrors popping out in strange places?