Open GoogleCodeExporter opened 9 years ago
Started planning. This seems like a useful addition, even though the resulting
language is no longer a CFG.
Original comment by aohelin
on 28 May 2012 at 5:16
Not added yet since I'm still not sure what needs to be done. There seem to be
two options:
a) allow backreferences as special nonterminal references which expand
nondeterministically to a value 'up in the tree'. The rules how one would like
the 'up in the tree' part to work in small test grammars can be captured by a
simple heuristic (a parent node exposes all expansions from its other branches
which are present in all possible expansions), but this still seems like a hack.
b) parameterize rules and handle the references explicitly.
c) use context sensitive grammars.
Original comment by aohelin
on 8 Jun 2012 at 6:52
*three
Original comment by aohelin
on 8 Jun 2012 at 6:53
Defining closed lambda-terms is a good simple test case for something like
this. Preliminary syntax:
output = term ("\n" term)*
var = [a-z]+
lexp = "λ" (var) "." term
term = \var | lexp | "(" term " " term ")"
This tries to shoehorn the familiar (?) regular expression binding syntax to
CFG context. (var) creates a shareable named binding, as opposed to indexed in
regexps. \var links to a random shared expansion of var-rule above it in the
(un)parse tree. Output is a sequence of terms to check that the bindings don't
leak sideways in branches.
Original comment by aohelin
on 17 May 2013 at 10:38
note that it would also be useful to be able to refer to an expansion within
the same rule, or more generally limit lookup distance, as in:
exp = [a-z]+ | "<" (tag) attr* ">" exp* "</" \tag ">"
But what should the syntax be? Are these the 99% of use cases for shared
content?
Original comment by aohelin
on 17 May 2013 at 11:17
Haven't figured out anything wrong with the following syntax/semantics, so will
probably start working on it soonish:
xexp = "<" \([a-z]{1,8}\) ">" xexp+ "<" \1 ">" <- local binding, indexed from left
| [a-z]+
var = [a-z]+ <- all rules can be referred
comb = "S" | "K" | "I"
lexp = "λ" var "." term
term = \var | lexp | comb | "(" term (" " term)+ ")" <- all vars are bound
Original comment by aohelin
on 22 May 2013 at 3:55
Added rhs-local backreferences, which seem to be distinct from up/down
references and have relatively clear scoping rules.
$ blab -e 'xexp = "<" \([a-z]{1,8}\) ">" xexp+ "</" \1 ">" | [a-z]+'
<wnlcvgf>ecqnkhyu<nfptn>jiznzrfwhu</nfptn></wnlcvgf>
Original comment by aohelin
on 2 Jun 2013 at 6:34
Original comment by aohelin
on 2 Jun 2013 at 6:37
Original issue reported on code.google.com by
aohelin
on 21 May 2012 at 2:48