we-like-parsers / pegen

PEG parser generator for Python
https://we-like-parsers.github.io/pegen/
MIT License
150 stars 32 forks source link

Any plans to make a more generic version? #67

Open StyXman opened 2 years ago

StyXman commented 2 years ago

I'm trying to generate a parser for a language that includes attribute names that contain -s and hex colors that begin with #.

The first one could be solved by reconstructing the name from NAME ('_' NAME)*, but that would also accept a - b as a name, which is not.

The second one forces us to use a second parser to separate the hex portion from f.i. any trailing comments (#abcdef // foo).

MatthieuDartiailh commented 2 years ago

65 would allow to use an alternate tokenizer and may solve some of your issues. It is currently pending reviews.

StyXman commented 2 years ago

It's way over my pay grade, so to speak. I'll wait for it, then.

jpsnyder commented 2 years ago

Maybe I'm missing something, but can pegen be used for non-python code? I was looking into using this for my own project. Since I already have a working lexer, but I would like to swap out the LALR parser for PEG, so this seemed promising. Could any lexer be used provided the iterated tokens have the expected attributes?

I was looking for an example of pegen being used on something that isn't Python just to see a proof of concept, but I can't seem to find anything. Everything uses import tokenize which are for Python tokens.

MatthieuDartiailh commented 2 years ago

As mentioned in my previous answer, there is a pending PR introducing a generic lexer interface that would allow you to use a custom lexer. However the other maintainers are part of the Python release team and do not have the bandwidth to review this PR or other pending PRs ATM.

jpsnyder commented 2 years ago

Yeah, excuse me for my ignorance. I'm having a hard time wrapping my head around how to get started to use pegen.
I might work off your branch for the time being. Although I was hoping there was a super simple example for how to use pegen without the Python tokenizer in there to help me get started.