no-context / moo

Optimised tokenizer/lexer generator! 🐄 Uses /y for performance. Moo.
BSD 3-Clause "New" or "Revised" License
821 stars 65 forks source link

Allow transforming values #63

Closed tjvr closed 7 years ago

tjvr commented 7 years ago

Now that we've dropped capturing groups, adding some sort of value transform seems more compelling. Of course, you can currently do this outside Moo; but where multiple rules have the same tokenType but different match patterns, we can't tell--from outside Moo--which transform to apply!

Together with #62, this allows us to have different rules with the same token type, but different rules and transform functions. Motivating example:

  STRING: [
    {match: /"""[^]*?"""/, lineBreaks: true, getValue: v => v.slice(3, v.length - 3)},
    {match: /"(?:\\["\\rn]|[^"\\])*?"/, lineBreaks: true, getValue: v => v.slice(1, v.length - 1)},
    {match: /'(?:\\['\\rn]|[^'\\])*?'/, lineBreaks: true, getValue: v => v.slice(1, v.length - 1)},
  ],

(In practice you'd also want to handle backslash escapes inside your getValue function.)

Surprisingly, this doesn't seem to hurt perf too bad.

diff

tjvr commented 7 years ago

@nathan

  1. Is this a good idea?
  2. Can you suggest a better name than getValue? :-)
nathan commented 7 years ago

Is this a good idea?

Instinct says "function calls are slow", but it's probably fine. We're not forcing anyone to use it.

Can you suggest a better name than getValue? :-)

These come to mind:

{ getValue: x => x.slice(1, -1) } // for comparison
{ value: x => x.slice(1, -1) }
{ transform: x => x.slice(1, -1) }
tjvr commented 7 years ago

Okay. I'll go with value. :-)

nathan commented 7 years ago

Okay. I'll go with getValue. :-)

It appears you went with value?

tjvr commented 7 years ago

That was what I meant to say, oops. :P