sirthias / parboiled2

A macro-based PEG parser generator for Scala 2.10+
Other
717 stars 86 forks source link

More parser examples #465

Open phdoerfler opened 1 year ago

phdoerfler commented 1 year ago

I'd appreciate a larger list of example parsers. There is the CSV parser and JSON parser, as well as the two calculator parsers and the ABC parser, of course. However, given sufficiently low caffeeine levels these and the documentation are not enough. I expect that my parser writing endeavors would be helped by more examples to draw inspiration from. Is there some place where people contribute their Parboiled2 parsers? Github's discussion tab or the wiki could be a great place for this. ANTLR has its own repository with lots of examples.

Compare:

To provide some context as to what kind of problem I am trying to solve, imagine sbt's "inspect tree" output:

[info] Loading global plugins from /home/foobar/.sbt/1.0/plugins
[info] Loading project definition from /work/space/scratch/project
[info] Set current project to scratch (in build file:/work/space/scratch/)
[info] some-project/*:packModuleEntries = Task[scala.collection.Seq[xerial.sbt.pack...
[info]   +-some-project/*:packDuplicateJarStrategy = latest
[info]   +-some-project/*:packExcludeArtifactTypes = List(source, javadoc, test, it,..
[info]   +-some-project/*:packExcludeJars = List(.*log4j.*, .*logback.*, .*specs2.*)
[info]   +-some-project/*:packModuleEntries::streams = Task[sbt.std.TaskStreams[sbt...
[info]   | +-*/*:streamsManager = Task[sbt.std.Streams[sbt.Init$ScopedKey[_ <: Any]]]
[info]   | 
[info]   +-some-project/*:update = Task[sbt.UpdateReport]
[info]     +-*/*:appConfiguration = xsbt.boot.AppConfiguration@ab2e887
[info]         

Say you are interested in the task and settings, e.g., some-project/*:packModuleEntries::streams, perhaps their indentation depth, and don't care about the rest. There are multiple lines, not all of which contain mention of a task or setting. Is it better to treat it as one big string and write a parser such as

ContentWeAreInterestedIn.separatedBy(ContentWeIgnore)

? Or is it better to treat it as individual lines, remove the [info] bit in the front separately, maybe even apply another rule on top of that now cleaned up line using a subparser or the ~> operator? When dealing with separate lines, how does one best filter out lines that do not contain a task or setting such as the first few lines and the fourth to last line that only contains a |? I think there is multiple valid approaches to this and I write parsers infrequently enough that it feels more like an uphill battle to figure this out vs. just using RegEx, .filter, etc., again.

In summary, a cookbook where users could contribute small examples sounds (to me) like a great idea. Especially for sligthly messy inputs. What are your thoughts?

sirthias commented 1 year ago

I agree that such a cookbook would be nice but I assume that at this point there won't be any capacity on anyone's side to really put sth like that together. It's a lot of work to do this nicely!

One real-life example of a larger pb2 parser construct are akka-http's header parsers here: https://github.com/akka/akka-http/tree/main/akka-http-core/src/main/scala/akka/http/impl/model/parser

The UriParser for example...

But these won't really help you with your problem. As you say, there are many valid approaches that might get you to your goal. I'd say, just pick the one you are most comfortable with! If it's regexes use regexes!

In general "real" parsers shine when you have languages (or problems) that allow for recursion, because these are inherently difficult or even impossible to parser with simpler approaches like regular expressions.

Also, if you know your input is always correctly formatted and will not contain syntax errors that need to be properly reported to the user, things become much easier. (As in your case.) A full-blown parser might be overkill, especially if you are not used to writing them. For someone else a quick parser might be easier to write and read than a cryptic regex. As always, it all depends... :)

phdoerfler commented 1 year ago

Thanks for the write-up! Indeed, the akka-http parsers are good examples. Perhaps not for my specific problem, but possibly for somebody else's. And yes, for my use case regexes and .filter and so on would suffice and be quicker to implement. However, I do have a background in grammars and using parser generators, such as JFLEX and I have written parsers by hand before. I've also used FastParse quite a bit, albeit with similar struggles. I have yet to give ANTLR4 a try. If only it had proper Scala integration. Maybe that's the salvation. But I digress.

Regarding the efforts writing a cookbook: I share your concerns but I think a little can already go a long way. Simple things like "how would I write this RegEx in Parboiled2?". Imagine a table, e.g.,

Regex Parboiled 2 Rule Notes
. ANY
(.) capture(ANY)
.* zeroOrMore(ANY)
(?.*) &(capture(zeroOrMore(ANY))) Bit of a constructed example
([^#]*)# capture(zeroOrMore(!"#" ~ ANY)) ~ "#"

Or a simple parser showing how to translate this scala code:

val s = """abc match
|def
|ghi match
|jkl match
|mno""".stripMargin.split("\n").filter(_.contains("match")).map(l => l.split(" ").head)

into an equivalent Parboiled2 parser. Little things from which people can then gather a more hands-on understanding of how a PEG parser and parboiled2 in particular can be used.

If this has a chance, then in the wiki or the discussions tab. Some place, where people can collaborate. If you still think it's a lost cause then I shall curb my optimism and not speak of it any further ;) Otherwise: I do have one or two small parsers to contribute.

sirthias commented 1 year ago

Ok, I see. Maybe its useful to look at other PEG parsing tools for more info, tutorials, cookbooks, etc? After all parboiled is just one of many. And the logic of PEG is the same everywhere really.

I for my part find PEG parsing much easier to understand than other approaches like LL and definitely LR. After all PEGs are the closest to the simple recursive descent parsers that one would usually write by hand if one starts out. Simply try out all options one by one and backtrack if sth fails. No lookahead, nothing fancy. Stupid trial and error. Just my style... ;-)

phdoerfler commented 1 year ago

Ok, I wasn't sure how exchangable PEG parsers would be. Do you know of any such cookbook? I know that Fastparse has a few examples and I assume there is quite some material out there for Scala parser combinators. But those support things, that Parboiled can't do (for instance, PCREs) so they are probably not as helpful. And then there is of course also the mailing list for parboiled2. Somehow that was not on my radar. Related: For PCREs there are many tools available that even graphically highlight how the regex and input are processed step by step, I would like something like this for PEG parsers, too :)