no-context / moo

Optimised tokenizer/lexer generator! 🐄 Uses /y for performance. Moo.
BSD 3-Clause "New" or "Revised" License
824 stars 66 forks source link

formatError for undefined tokens #114

Closed tjvr closed 5 years ago

tjvr commented 5 years ago

It's fairly natural to write code of the form:

  while (tok = lexer.next()) {
    try {
      parser.eat(tok)
    } catch (err) {
      throw new Error(lexer.formatError(tok, "Syntax error"))
    }
  }

  try {
    var program = parser.result()
  } catch (err) {
    throw new Error(lexer.formatError(tok, "Unexpected EOF")) // Not allowed!
  }
  return program

The second formatError call is not valid, because tok will be undefined here. Moo uses undefined to indicate that there are no more tokens, i.e. we've reached the end of the buffer.

There's no way to get Moo to format an error at the end of the file, after the last token, without manually constructing an EOF token. I propose letting formatError accept undefined, and silently interpret it as an EOF token.

Alternatively, we could introduce a lexer.makeEOF() method which returns this end-of-file token directly.

nathan commented 5 years ago

I don't really have a problem with formatError() interpreting null/undefined as EOF, but it seems confusing and unreadable to write code that uses tok outside of its logical scope to mean the constant undefined. Even though the example above uses a while loop, it reads as a for loop:

for (let tok; tok = lexer.next();) {
  try {
    parser.eat(tok)
  } catch (err) {
    throw new Error(lexer.formatError(tok, "Syntax error"))
  }
}

which makes using it outside of the loop unintuitive and odd. I think it makes more sense to write the second call to formatError() as

lexer.formatError(null, "Unexpected EOF")

or simply

lexer.formatError("Unexpected EOF")

Additionally, perhaps such a call should use the current lexer position rather than always use EOF; it would be odd to call formatError in the middle of the token stream and have the result point to its end.

tjvr commented 5 years ago

Agreed on all points! Thanks 😊

Sent with GitHawk

nathan commented 5 years ago

Sounds good. I think my comment above may still be relevant:

perhaps such a call should use the current lexer position rather than always use EOF; it would be odd to call formatError in the middle of the token stream and have the result point to its end.

nathan commented 5 years ago

Awesome, thanks! LGTM