Closed blainehansen closed 6 years ago
Hey, can you explain what your use case is? 🙂
So my desired features fall into two basic categories.
This one's simple. By adding the ignore: true
option to a token rule, it is discarded instead of returned from next
, instead recursively calling next
again to return the first non-ignored token.
const ignoringLexer = compile({
Dot: '.',
Bang: '!',
Space: { match: / +/, ignore: true },
})
const { Dot, Bang, Space } = ignoringLexer.tokenLibrary()
ignoringLexer.reset(" . ! . ")
const tokens = Array.from(ignoringLexer)
expect(tokens).toHaveLength(3)
expect(matchTokens(tokens, [Dot, Bang, Dot])).toBe(true)
I'm currently working on an automatic lookahead parser called Kreia
. It's obviously convenient to have an abstracted system for matching tokens baked into a token creation library, so matchToken
and matchTokens
do so. The tokenLibrary
function helps expose all the token types for matching. It's better than using a 'stringly typed' system.
Categories are useful because when parsing, you sometimes want to require a specific token, and other times require a general token. You can create a category with the createCategory
function, potentially giving it many parent categories to belong to.
const BinaryOperator = createCategory('BinaryOperator')
const BooleanOperator = createCategory('BooleanOperator', BinaryOperator)
const opLexer = compile({
Equal: { match: '=', categories: BinaryOperator },
PlusEqual: { match: '+=', categories: BinaryOperator },
SubEqual: { match: '-=', categories: BinaryOperator },
DoubleEqual: { match: '==', categories: BooleanOperator },
NotEqual: { match: '!=', categories: BooleanOperator },
Space: { match: / +/, ignore: true },
})
const { Equal, PlusEqual, SubEqual, DoubleEqual, NotEqual } = opLexer.tokenLibrary()
opLexer.reset("= += -= == !=")
const tokens = Array.from(opLexer)
expect(matchTokens(tokens, [Equal, PlusEqual, SubEqual, DoubleEqual, NotEqual])).toBe(true)
expect(matchTokens(tokens, [BinaryOperator, BinaryOperator, BinaryOperator, BinaryOperator, BinaryOperator])).toBe(true)
expect(matchTokens(tokens, [BinaryOperator, BinaryOperator, BinaryOperator, BooleanOperator, BooleanOperator])).toBe(true)
There are cases where a specific token of a category is needed for parsing, and other times when any token of a category will do. Categories help pull that all apart. For example in python, the *
can either be used for multiplication (and in that context almost any binary operator will do), or it can be used for *args
, where it needs to be asked for very specifically.
Any more thoughts about this? I definitely need to maintain a version of this somewhere, and document it for users of my parser engine, but I'd be happy to maintain a forked version with stripped down documentation.
I'm afraid I don't think this is something that I want to merge into Moo at this time. I'm curious as to why you can't move this logic into your parser, but you are of course welcome to maintain a fork 🙂
What's the parser, OOI?
Okay, I'll maintain a fork :smile:
And the reason I don't want to move this into the parser engine was because it needed to intercept moo's compile process, or at least it seemed that way as I wrote it. And I'm using a different test framework and didn't want to duplicate all of moo's existing tests or have two testing flows.
The parser is Kreia.
I haven't added any documentation for these functions, but all the old tests and my new ones pass.
I also didn't run any of the benchmarking code, and in some places I prioritized code simplicity over speed or browser compatibility.
I'm opening this more to see if you're interested in these features, I'll use a local copy of this code internally for another project if not.
Thanks! :smile: