Closed tjvr closed 5 years ago
I've rewritten this on top of the latest master.
Lexer#has()
will now always return true
, so most Nearley grammars should continue to work.
A few notes about the example and case insensitivity.
keyword: ['class', 'def']
should be keyword: ['CLASS', 'DEF]
since the purpose of the const caseInsensitiveKeywords
is to lower case the keywords given.CLASS
, class
, ClAsS
, cLaSs
, etc.moo.keywords
seems to be a round about way of achieving case insensitivity or other possibilities. Perusing moo.js
, it seems that a simpler solution would be to allow an Array that is a mixture of strings or regular expressions in keywordTransform
. Currently, only strings are allowed for the keyword array otherwise an error is thrown indicating such. However, if regular expressions were allowed in addition to strings you could do:
let lexer = compile({
identifier: {
match: [ /[Cc][Ll][As][Ss][Ss]/, /[Dd][Ee][Ff]/, 'lambda' /* I really only want this one as lower case */ ],
type: v => v.toLocaleUpperCase( ),
},
})
When the Array contains only strings, proceed with the existing transform code that builds a switch statement, otherwise convert the strings in the array to regular expressions (quoting meta characters), then create a matchable regular expression in place of the switch statement being built. The returned function from keywordTransform
would just match the token found against the built regular expression, e.g., token.match( rePossibilities )
. I suspect that there will be a threshold between executing the switch
statement vs. executing the regular expression match
, which may be something else to consider in keywordTransform
.
For (1,2) above, perhaps I misunderstood the example in this issue, feel free to enlighten me.
I think you misunderstood the example; (1) and (2) don't sound right to me.
caseInsensitiveKeywords
uses the keywords ['class', 'def']
passed in to build a regular non-case-sensitive map using the built-in moo.keywords()
function.
It then returns a closure which calls toLowerCase()
on the value -- the token that was lexed -- before passing it to moo.keywords()
.
We recently added a
value
transform.This PR:
type
transform.moo.keywords()
.Lexer#has()
.The existing value transform takes the
text
and returns thevalue
. By default, the text is used unchanged.The new type transform takes the
text
and returns thetype
. By default, the type of the rule is used (e.g.identifier
).Example: case-insensitive keywords
This is my preferred solution for #67 / #78.
For example, you can create a customised version of
moo.keywords
which matches case-insensitively:Lexer#has()
This unfortunately makes it impossible to write a
Lexer#has
function, since we can't infer what token names might be returned by this custom function.This will make Moo incompatible with the current version of Nearley: we introduced
has()
so that we could tell whether%foo
refers to a custom token matcher such asfoo = { test: x => Number.isInteger(x) }
, or a lexer token. But custom token matchers will likely be removed [from Nearley] going forward, sohas()
will have no use.