no-context / moo

Optimised tokenizer/lexer generator! 🐄 Uses /y for performance. Moo.
BSD 3-Clause "New" or "Revised" License
824 stars 66 forks source link

added categories, matching functions, ignore #87

Closed blainehansen closed 6 years ago

blainehansen commented 6 years ago

I haven't added any documentation for these functions, but all the old tests and my new ones pass.

I also didn't run any of the benchmarking code, and in some places I prioritized code simplicity over speed or browser compatibility.

I'm opening this more to see if you're interested in these features, I'll use a local copy of this code internally for another project if not.

Thanks! :smile:

tjvr commented 6 years ago

Hey, can you explain what your use case is? 🙂

blainehansen commented 6 years ago

So my desired features fall into two basic categories.

ignoring

This one's simple. By adding the ignore: true option to a token rule, it is discarded instead of returned from next, instead recursively calling next again to return the first non-ignored token.

const ignoringLexer = compile({
  Dot: '.',
  Bang: '!',
  Space: { match: / +/, ignore: true },
})

const { Dot, Bang, Space } = ignoringLexer.tokenLibrary()

ignoringLexer.reset(" . ! . ")
const tokens = Array.from(ignoringLexer)
expect(tokens).toHaveLength(3)
expect(matchTokens(tokens, [Dot, Bang, Dot])).toBe(true)

token categories and matching functions

I'm currently working on an automatic lookahead parser called Kreia. It's obviously convenient to have an abstracted system for matching tokens baked into a token creation library, so matchToken and matchTokens do so. The tokenLibrary function helps expose all the token types for matching. It's better than using a 'stringly typed' system.

Categories are useful because when parsing, you sometimes want to require a specific token, and other times require a general token. You can create a category with the createCategory function, potentially giving it many parent categories to belong to.

const BinaryOperator = createCategory('BinaryOperator')
const BooleanOperator = createCategory('BooleanOperator', BinaryOperator)

const opLexer = compile({
  Equal: { match: '=', categories: BinaryOperator },
  PlusEqual: { match: '+=', categories: BinaryOperator },
  SubEqual: { match: '-=', categories: BinaryOperator },
  DoubleEqual: { match: '==', categories: BooleanOperator },
  NotEqual: { match: '!=', categories: BooleanOperator },
  Space: { match: / +/, ignore: true },
})

const { Equal, PlusEqual, SubEqual, DoubleEqual, NotEqual } = opLexer.tokenLibrary()

opLexer.reset("= += -= == !=")
const tokens = Array.from(opLexer)
expect(matchTokens(tokens, [Equal, PlusEqual, SubEqual, DoubleEqual, NotEqual])).toBe(true)
expect(matchTokens(tokens, [BinaryOperator, BinaryOperator, BinaryOperator, BinaryOperator, BinaryOperator])).toBe(true)
expect(matchTokens(tokens, [BinaryOperator, BinaryOperator, BinaryOperator, BooleanOperator, BooleanOperator])).toBe(true)

There are cases where a specific token of a category is needed for parsing, and other times when any token of a category will do. Categories help pull that all apart. For example in python, the * can either be used for multiplication (and in that context almost any binary operator will do), or it can be used for *args, where it needs to be asked for very specifically.

blainehansen commented 6 years ago

Any more thoughts about this? I definitely need to maintain a version of this somewhere, and document it for users of my parser engine, but I'd be happy to maintain a forked version with stripped down documentation.

tjvr commented 6 years ago

I'm afraid I don't think this is something that I want to merge into Moo at this time. I'm curious as to why you can't move this logic into your parser, but you are of course welcome to maintain a fork 🙂

What's the parser, OOI?

blainehansen commented 6 years ago

Okay, I'll maintain a fork :smile:

And the reason I don't want to move this into the parser engine was because it needed to intercept moo's compile process, or at least it seemed that way as I wrote it. And I'm using a different test framework and didn't want to duplicate all of moo's existing tests or have two testing flows.

The parser is Kreia.