no-context / moo

Optimised tokenizer/lexer generator! 🐄 Uses /y for performance. Moo.
BSD 3-Clause "New" or "Revised" License
814 stars 65 forks source link

Keyword order #180

Closed andraaspar closed 1 year ago

andraaspar commented 1 year ago

In #159 @tjvr mentioned that moo relies on the order that keys are retrieved from a JavaScript object. That's unexpected... I thought the order of keys in a JavaScript object was undefined as per the spec and so this really surprised me. Looking into it, I see that since ES2015 there is an order specified in the spec, albeit a complicated one.

It would be best to spell out this feature in the documentation, or, maybe, since the order is significant, moo should use an array instead of an object:

moo.compile({
  TERM: /[a-z]+/,
  PREFIXTERM: /\*|(?:[a-z]+\*)/,
})

would become:

moo.compile([
  ["TERM", /[a-z]+/],
  ["PREFIXTERM", /\*|(?:[a-z]+\*)/],
])

And that would be explicit about the order being significant.

nathan commented 1 year ago

complicated

As long as you're not using very weird (i.e. integer, symbol, or duplicated) token names, the order has always been the order they appear in the source code, even before this was required by the specification.

If using basic language features makes you uncomfortable, moo already accepts an array of rules:

moo.compile([
  {type: 'TERM', match: /[a-z]+/},
  {type: 'PREFIXTERM', match: /\*|(?:[a-z]+\*)/},
])

This should probably be documented.

andraaspar commented 1 year ago

@nathan Thank you for taking the time to answer my question. That snippet is exactly the thing I was looking for.

Edit: What does this snippet become?

    moo.compile({
      IDEN: {match: /[a-zA-Z]+/, type: moo.keywords({
        KW: ['while', 'if', 'else', 'moo', 'cows'],
      })},
      SPACE: {match: /\s+/, lineBreaks: true},
    })

If using basic language features makes you uncomfortable

I am sorry if I offended you somehow.

nathan commented 1 year ago

You could do

moo.compile([
  {match: /[a-zA-Z]+/, type: moo.keywords({
    KW: ['while', 'if', 'else', 'moo', 'cows'],
  }), defaultType: 'IDENT'},
  {match: /\s+/, lineBreaks: true, type: 'SPACE'},
])

but unless you actually have multiple rules for the same type that can't be combined, just use an object. Array rules are more verbose, harder to read, and exactly equivalent to object rules.

@tjvr is there a reason we don't accept defaultType vel sim. in array rules?

tjvr commented 1 year ago

Do we not? A quick glance at the code suggests we do 🤔

nathan commented 1 year ago

Oops, we do. My mistake.

We also accept the somewhat bizarre

moo.compile([
  {match: /[a-zA-Z]+/, type: 'NOT_IDENT', defaultType: 'IDENT'},
  {match: /\s+/, lineBreaks: true, type: 'SPACE'},
])

which perhaps we should not.