textX / Arpeggio

Parser interpreter based on PEG grammars written in Python http://textx.github.io/Arpeggio/
Other
269 stars 55 forks source link

Support for injecting extra rules written in Python into PEG grammars #29

Open ninmesara opened 8 years ago

ninmesara commented 8 years ago

It would be useful to be able to inject rules written as python functions into PEG grammars. This would accomplish two things:

  1. Greater portability for libraries. I could publish a library with python functions which anyone could use regardless of whether they're using the peg, cleanpeg or python parsers. Python functions, although more cumbersome to write, are more composable.
  2. It would allow allow the user to write special rules able to respect whitespace in PEG files, while skipping whitespace in the rest of the rules. I believe this is currently impossible without rewriting the whole grammar in Python.

I'd suggest the following API:

from lib.external import rule1, rule2
from arpeggio.cleanpeg import ParserPEG
parser = ParserPEG(calc_grammar,
    "calc", 
    extra_rules={'rule_name1': rule1,  'rule_name2': rule2})

The user could then use 'rule_name1' and 'rule_name2' in the file, and the rules would be automatically resolve. There might be a problem with name clashes between user defined rules and inner rules defined by the external functions, though. I'm not familiar enough with Arpeggio's internals to be sure.

igordejanovic commented 8 years ago

I'm planing a more general approach for parser composability.

Something like this:

from lib.external import rule1, rule2
from arpeggio import GrammarPython, GrammarPEG, GrammarCPEG, Parser

...
parser = Parser(GrammarPython(calc), GrammarPEG(calc_override_in_peg),
                GrammarPython(rule1, rule2), GrammarCPEG(clean_peg_addition))

Grammar* callables will know how to read grammar written in different styles and transform it to internal grammar representation which is known to Parser class. Parser will do grammar composition and full resolving using some predetermined override rule (e.g. rules that come later in the grammar list will override former rules with the same name).

In this approach you could mix and match grammars using different styles. E.g., you could do the override in PEG or in clean PEG or in some other form. You could write your own Grammar* wrapper and specify grammar how you see fit and still be able to compose with other grammars.

Grammars could be incomplete, i.e. rules could reference unexisting rules thus providing a kind of extension points. Of course, when forming a final parser all the rules must be available.

Additionally, in the list of the grammars you shall be able to use ParsingExpressions directly thus enabling work in a parser combinator style.

All this stuff require some non-trivial changes to the core though.

ninmesara commented 8 years ago

It sounds excelent! Although I like the possibility of refering to rules of a different grammar, I think there shoud be a "blackbox" option that allows you to hide the inner rules of a grammar. This way you could use rules written by different authors without worrying about name collisions.

Anyway, thanks for writing Arpeggio and making it available for free. It's a great library and the documentation is among the best I've ever read.

On Friday, 7 October 2016, Igor Dejanović notifications@github.com wrote:

I'm planing a more general approach for parser composability.

Something like this:

from lib.external import rule1, rule2from arpeggio import GrammarPython, GrammarPEG, GrammarCPEG ... parser = Parser(GrammarPython(calc), GrammarPEG(calc_override_in_peg), GrammarPython(rule1, rule2), GrammarCPEG(clean_peg_addition))

Grammar* callables will know how to read grammar written in different styles and transform it to internal grammar representation which is known to Parser class. Parser will do grammar composition and full resolving using some predetermined override rule (e.g. rules that come later in the grammar list will override former rules with the same name).

In this approach you could mix and match grammars using different styles. E.g., you could do the override in PEG or in clean PEG or in some other form. You could write your own Grammar* wrapper and specify grammar how you see fit and still be able to compose with other grammars.

Grammars could be incomplete, i.e. rules could reference unexisting rules thus providing a kind of extension points. Of course, when forming a final parser all the rules must be available.

Additionally, in the list of the grammars you shall be able to use ParsingExpressions directly thus enabling work in a parser combinator style.

All this stuff require some non-trivial changes to the core though.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/igordejanovic/Arpeggio/issues/29#issuecomment-252378801, or mute the thread https://github.com/notifications/unsubscribe-auth/APSENYWbMe5f4Y9nK3eJp_DQor0IIOdYks5qxsv0gaJpZM4KOxM7 .

vuvova commented 6 years ago

Just FYI, this is how I did it — https://github.com/vuvova/gdb-tools/blob/64a9280/duel/parser.py

The main grammar starts from line 52, note the token cast in the line 72. And see above how it's created as a separate Arpeggio parser which later tries line 26, and it that succeeds the token matches, otherwise it doesn't match.

It'd be cleaner to inherit from Match, not to monkey-patch it, but Arpeggio doesn't allow it at the moment.

igordejanovic commented 6 years ago

Thanks. It would indeed be better if new Match inherited class is used. What do you get if you try to inherit? I haven't tried something myself but it should generally work, or at least it should be easily fixable if it doesn't work at the moment. I looked into implementation of parser construction and general Match inherited class instances should be handled at this line.

vuvova commented 6 years ago

May be I used an older version? There was isinstance(..., Match), as far as I remember.

You can try to inherit with a dummy class, like

class MatchChild(Match)
    pass

and see where it won't work. It should be easily fixable, I agree.