microsoft / Trieste

A term rewriting system for experimental programming language development.
MIT License
37 stars 20 forks source link

Added parser support for checking parent of current group #90

Closed EliasC closed 4 months ago

EliasC commented 9 months ago

This PR adds a single function to the Make class that allows checking the type of the parent of the current group during parsing. Just as m.in(Foo) returns true if the cursor is in a Foo node, m.group_in(Foo) returns true if the cursor is in a group which in turn is in a Foo node.

A concrete use case for this is being able to parse lists with prefix separator:

match n with
| 0 -> "zero"
| 1 -> "one"
| _ -> "a lot"

For infix separators we can use seq(Bar) to push the current group under some separator. This is the behaviour we want for the second and third | above, but for the first | seq(Bar) will make the group match n with a child of the first |. Instead, we want to do push(Bar) for the first |, but using that for the latter ones will nest the lists in each other. With group_in we can write the parser for Bar as

if (m.group_in(Bar))
  m.seq(Bar);
else
  m.push(Bar);

Another use case is to be able to conditionally terminate groups based on their parents:

(entry1, entry2, block: subentry1, entry3)

Here, block: subentry1 creates a deeper structure within the list and the last comma should get us back to the first level. Because we are in a group when parsing subentry1, currently the only way to see if the comma should terminate the current group or a group higher up is to speculatively terminate the group and see if we did the right thing (and if we didn't, there is no way to backtrack). With group_in we can parse commas like so:

while (m.group_in(Block))
  m.term({Block});

m.seq(Comma);

While both of these use cases might be solvable using further rewrite passes (or parsing modes, or other extra machinery in the parser), I would argue that group_in fits nicely into the idea of parsing as a series of local decisions based on the latest input string and the current position in the parse tree. The programmer never has to deal with creating groups, but once a group has been created it makes sense to be able to ask questions about that group. The in(Foo) method lets us ask "are we in the beginning of parsing (the inside of) a Foo?" and in_group(Foo) lets us ask "are we in the middle of parsing (the inside of) a Foo?".

Both of these examples comes from writing a parser for shrubbery notation, which is also indentation sensitive, meaning there is some grouping information that may only be available during parsing (or that we at least don't want to have to carry into the rewriting passes). This also complicates the approach of "parse stupidly, rewrite cleverly".

EliasC commented 9 months ago

@mjp41 I could always add another sample which uses this feature. Or were you thinking something smaller, more unit test-like?

EliasC commented 8 months ago

I'm fixing up the Shrubbery parser to add as test cases (and a sample) to add to this PR.