robrix / Madness

Recursive Descent Into Madness
MIT License
291 stars 17 forks source link

Recursive Grammars #40

Closed regexident closed 9 years ago

regexident commented 9 years ago

So I was messing around with Madness and tried implementing a very rudimentary regex (as in "real" regex) parser using Madness:

<expression> ::= <union> | <term>
<union> ::= <term> '|' <expression>
<term> ::= { <factor> }
<factor> ::= <base> { '*' }
<group> ::= '(' <expression> ')'
<base> ::= <char> | '\' <char> | <group>

A crude translation to Madness would look like this I guess:

let expression = union | term
let union = term ++ ignore("|") ++ expression
let term = factor *
let factor = base ++ ignore("*") *
let group = ignore("(") ++ expression ++ ignore(")")
let base = any | ignore("\\") ++ any | group

The obvious first question that comes to mind (apart from hunting down left-recursions, etc) is:

How does one define recursive grammars in Madness like the one seen above?

Obviously the compiler complained about unresolved identifiers. Just as I'd expect it to do in such a situation.

Would you mind to elaborate on this very briefly (or even add an example to the playground)?

robrix commented 9 years ago

You’ll find that it doesn’t give you the complaint about unresolved identifiers when you’re in a top-level context of e.g. a library or app; it only does that when you’re in a function, or in a file that’s run like a script—a #! file or main.swift or a playground.

Unfortunately, library/app cases compile just fine, they deadlock at runtime. If that were the end of the story, Madness would only be able to parse regular grammars, which would be a letdown.

Fortunately, there’s a good solution to both. There’s even an example of it in the playground already!

Since we can’t define recursive grammars directly, we instead need to use a fixpoint to allow us to define them indirectly. Prelude includes a function, fix, which does exactly this:

let expression = fix { expression in
    let union = term ++ ignore("|") ++ expression
    let term = factor*
    let factor = base ++ ignore("*")*
    let group = ignore("(") ++ expression ++ ignore(")")
    let base = any | ignore("\\") ++ any | group
    return union | term
}

This clearly needs to be documented in the README (at a minimum) since it’s non-obvious if you haven’t done it before.

robrix commented 9 years ago

For non-executing contexts, this is a little simpler than I described above :point_up:

You don’t actually have to place everything inside the fix closure; the union, term, etc. bindings can be placed outside of it. All you need the fix closure to return is the value that should provide the value of expression:

let union = …
…
let expression = fix { _ in union | term }

The reason is that fix will return a function which correctly evaluates the alternation of union and term, but since it delays the evaluation until after initialization time, there’s no deadlock when initializing the symbols.

For executing contexts—#! scripts, main.swift, the REPL, and playgrounds—you still need to initialize everything using expression inside the fix closure; they can’t deal with the mutually recursive symbols in the first place.

regexident commented 9 years ago

Hi Rob,

thanks for your response! :+1: (and sorry for not getting back to you any sooner :pensive: )

So for parsers that depend on themselves (direct recursions) one can use fix.

But how about indirect recursions? … such as term, which depends on factor, which depends on base, which depends on group, which depends on expression, which depends on term?

Cheers, Vincent

robrix commented 9 years ago

Same answer, actually! If it’s an executing context, you have to put all of them inside the fix (which may be awkward/impossible for e.g. mutual recursion), but in a non-executing context you can just wrap the one key definition with fix.

The lambda calculus example in the playground is a good example of the former, while the latter is just:

let expression = fix { _ in union | term }

All it’s really doing is letting Swift finish binding the value of let expression before that value is needed to (eagerly) bind the values of union and term. I think this would be an equivalent solution:

let expression = { (union | term)($0) }

This is directly equivalent to let expression = union | term, but the indirection—wrapping the parsers used to implement expression in an explicit closure—allows Swift to terminate without deadlocking.

robrix commented 9 years ago

Going to close this out now.