Closed let-def closed 1 year ago
I just added support for $startloc(x)
and $endloc(x)
"keywords" to refer to the location of captured variables.
token
is also bound to the current lookahead token, located at $startloc(token)
and $endloc(token)
.
In the previous version the lookahead token was not treated specially -- if one cared about it, it had to be passed manually as an extra argument. It is now mandatory to pass the lookahead token when invoking the matcher. This allows to implement a "lookahead constraint" on clauses, see https://github.com/let-def/lrgrep/blob/lrgrep-v3/ocaml/parse_errors.mlyl#L87 for instance. No need to make these actions partial and to match on the token manually anymore.
In the future this will allow to implement a precise coverage checker.
I just rebased and merged into master. Main development will continue there.
This is the third redesign of the approach. There are a few fundamental changes here, most notably in the pattern language.
A summary of the syntax, in pseudo-BNF:
The main differences with the previous version are that the reductions are now delimited, we distinguish between greedy and non-greedy variants of recursive constructions, and filters allow to restrict a set of matches without consuming anything by themselves.
[...]
was used to introduce items and has been repurposed to trigger reductions!Reductions
In the previous version, reductions were introduced with
!
which was trying to reduce its left-handside as far as possible. Now the target of the reduction should be entirely specified within square brackets[
...]
. For instance,...foo... expr !
was matching anything that reduce to an expr and possibly more, if other matches were found in the prefix. The problem was that we could not tell exactly where the reduction stopped. Since we cannot capture semantic values inside a reduction, it was not possible to tell what could be captured from the syntax. This example would now be writter...foo... [expr]
, and it is clear we only reduceexpr
. Captures are allowed anywhere in...foo...
, andloc=[expr]
captures the location of the symbols being reduced.The error rules for OCaml have been ported to: https://github.com/let-def/lrgrep/blob/lrgrep-v3/ocaml/parse_errors.mlyl
Greediness
Greedy vs non-greedy allow to control the span of the capture: if there are multiple matches, do we want the shortest one (with
[]
) or the longest one (with[[]]
)? For instance, when working with the arithmetic grammar, and the inputa + b + c
:x=expr + y=[expr]
bindsx
tob
andy
to the location ofc
x=expr + y=[[expr]]
bindsx
toa
andy
to the location ofb + c
Items
In the previous versions, there were two kinds of atoms:
Now symbols (with wildcards) are the only kind of atom. Items can now be expressed with the
/
filter construction, that doesn't consume anything on stack but restricts the possible matches.For instance, the old rule
x=[label_declaration: mutable_flag LIDENT . COLON]
can now be writtenx=LIDENT / label_declaration: mutable_flag LIDENT . COLON
. On the one hand it is a bit redundant (LIDENT is written twice), but on the other hand, it is clear that onlyLIDENT
is consumed from the stack (and will be bound tox
). Furthermore, the filtering semantics compose better with other constructions (one can filter a reduction, a star, etc.).Misc
Following #7, the DFA is now minimized using Valmari.