Closed nathan closed 5 years ago
This is cool, thanks for investigating this!
My first thought is: wow, that's a fairly sophisticated example tokenizer you have there (the JS-template-like one). I'm not sure I could have written it myself; I wonder if we've made a tool that only you can use correctly ;-)
(As a minor point of style, I'd have kept the rbrace
rule in std
, and have my parser generate "unexpected }" errors.)
Each rule is included at the earliest possible position in the rule list (i.e., the position where it is most able to match).
Can you explain what you mean by this? Order is reasonably important in Moo lexers, so it would be good to understand this (and make sure we get the semantics right). Is it that in normal behaviour, the rules are inserted in place of the include
; and the "earliest possible position" constraint only applies if a rule would be included more than once?
I suppose using a dictionary for this means we can't allow multiple include
directives in different places in the rule list of a state. I'm not sure whether this would be common in practice.
I'm tempted to recommend writing this as a separate transform -- e.g. moo.withStates()
-- which takes in a set of states with include
s, and emits a set of states with all the includes expanded. But mostly I'd just find this easier to comprehend and test; there's no reason the public API has to be broken up this way.
Is it that in normal behaviour, the rules are inserted in place of the include; and the "earliest possible position" constraint only applies if a rule would be included more than once?
Correct. So if std
includes comment
, and I include comment
before I include std
, the rules from comment
match at the level of the comment
inclusion.
I suppose using a dictionary for this means we can't allow multiple include directives in different places in the rule list of a state. I'm not sure whether this would be common in practice.
That's a good point. I can think of two solutions:
state: [
{include: 'comment'},
{name: 'id', match: /\w+/},
{include: 'std'},
// ...
],
state: {
include_comment: 1,
id: /\w+/,
include_std: 1,
// ...
},
(FWIW, include
rules already optionally accept an array of states to include, e.g., {include: ['ws', 'comment']}
, so this is only necessary if you want to include in two different places in the rule ordering.)
The include
resolver is pretty separable; it's just this chunk of compileStates. Feel free to relegate it to its own function for testing.
I'd missed your suggestion of $all
there.
I think this is cool, and includes are important! However, is there a good reason not to use JavaScript's object spread syntax?
const ws = {
ws: { match: /\s+/, lineBreaks: true },
}
const comment = {
lc: /\/\/.+/,
bc: /\/\*[^]*?\*\//,
}
const std = {
...comment,
...ws,
id: /[A-Za-z]\w*/,
op: /[!=]==|\+[+=]?|-[-=]|<<=?|>>>?=?|&&?|\|\|?|[<>!=/*&|^%]=|[~!,/*^?:%]/,
tbeg: { match: /`(?:\\[^]|[^\\`])*?\${/, value: s => s.slice(1, -2), push: 'template' },
tsim: { match: /`(?:\\[^]|[^\\`])*?`/, value: s => s.slice(1, -1) },
str: { match: /'(?:\\[^]|[^\\'])*?'|"(?:\\[^]|[^\\"])*?"/, value: s => s.slice(1, -1) },
lbrace: { match: '{', push: 'brace'},
}
const main = {
...std,
}
const brace = {
...std,
rbrace: { match: '}', pop: 1 },
}
const template = {
...std,
tmid: { match: /}(?:\\[^]|[^\\`])*?\${/, value: s => s.slice(1, -2) },
tend: { match: /}(?:\\[^]|[^\\`])*?`/, value: s => s.slice(1, -1), pop: 1 },
}
const lexer = moo.states({
$all: { err: moo.error },
main, brace, template,
})
is there a good reason not to use JavaScript's object spread syntax?
I'm not sure it's a good reason, but a reason is that this implements adding rather than replacing rules. Compare:
moo.states({
base: {op: ['*', '/', '+', '-']},
mod: {include: 'base', op: ['%']},
})
where mod
matches *
, /
, +
, -
, and %
as op
, with:
const base = {op: ['*', '/', '+', '-']}
const mod = {...base, op: ['%']}
moo.states({base, mod})
where mod
only matches %
as op
. I was originally doing this in my JS-ish lexer above to distinguish /
-op from /
-regex, but it became too complicated for an example, so I took it out.
this implements adding rather than replacing rules
I'm sold.
I'm still not sure if this is the right interface, as discussed previously. But I'd like to merge this and play around with it for a bit.
PR so you can play around with it. I'll add tests if this seems like the right idea.
This adds a new
include
rule, which allows you to include all the rules from another state.include
behaves as follows:Here's an example of parsing JS-like templates:
Here's how it behaves with cycles: