no-context / moo

Optimised tokenizer/lexer generator! 🐄 Uses /y for performance. Moo.
BSD 3-Clause "New" or "Revised" License
817 stars 65 forks source link

Improved error reporting #129

Closed airportyh closed 2 years ago

airportyh commented 4 years ago

This PR aims to improve the error message displayed to the user when a syntax error is encountered by the lexer. In particular:

This PR is related to https://github.com/kach/nearley/pull/459.

Screenshot:

Screen Shot 2019-08-01 at 3 54 43 PM
tjvr commented 4 years ago

Attaches the errored token to the thrown error object, eventhough it's not a "good" token

I'm not convinced about this, either; if you want the error token, can't you define one explicitly?

airportyh commented 4 years ago

I appreciate the desire to improve error reporting in parsers, but I'm not convinced that moo should return extra lines of the input.

Could you elaborate? Do you have concerns about effects that the change could cause? Or don't think the issue is important? I'd be happy to work on the code more to make it satisfactory to you, but if you are not interested at all to merge this then I will leave it alone.

airportyh commented 4 years ago

Attaches the errored token to the thrown error object, eventhough it's not a "good" token

I'm not convinced about this, either; if you want the error token, can't you define one explicitly?

I wanted to display the character that the lexer errored on within the parser error message. This was not possible without reaching in to the non-public api of the lexer (as specified on the Nearley website) to access its current index and buffer. I felt that passing this token back is slightly better than doing that. Open to other suggestions though.

nathan commented 4 years ago

@airportyh

Screen Shot 2019-08-01 at 3 54 43 PM

It looks like your terminal emulator is treating the unicode arrow as either fullwidth or proportional width, as many emulators do, so it only sort of lines up with the character it's trying to point to. It's better just to use a caret/circumflex (^) because that's pretty much guaranteed to line up, even if it doesn't look quite as arrow-y.

I find context lines in error messages quite useful, and line numbers marginally so. However, you've implemented context wrong: you should give n lines on both sides of the problematic one, not just n lines before. It's much more useful to see this:

5  function checkPerson(name) {
6    const person = findPerson(name)
7    const result process(person)
                  ^
8    if (result.extremities > 10) return false
9    try {

than this:

3    return lizard
4  }
5  function checkPerson(name) {
6    const person = findPerson(name)
7    const result process(person)
                  ^

(See, e.g., the --unified or --context options of diff or django's debug tracebacks.) The blank line required for the pointer also draws your eye straight to the (in)correct line instead of forcing you to scan to the bottom to find the one that caused the error.

Another formatting nit: there should be at least two columns between line numbers and code: two spaces, a full stop and a space, or something similar.

@tjvr I don't have a problem with making this the default behavior, but if you'd like to be consistent with previous versions and/or keep error messages less bulky, perhaps we could make it an option to formatError.

tjvr commented 4 years ago

I agree with all of Nathan's comments. Since he's happy to make this the default, so am I. I think I'd rather have bulkier error messages for everyone than introduce extra arguments to formatError. I think one or two lines before and after the error would be sufficient. :)

As above, I'd rather not attach the token object to the error message, but it sounds like we have to.

airportyh commented 4 years ago

Nice! I appreciate the feedback. I plan to address the mentioned issues soon. Here I'll summarize what I am planning to do:

If either of you would like to change this list please give me a heads up. I plan to start work some time tomorrow.

nathan commented 4 years ago

throw or crop? I can consult other well-known implementations

Neither. Return the original string, like String.prototype.padStart.

airportyh commented 4 years ago

throw or crop? I can consult other well-known implementations

Neither. Return the original string, like String.prototype.padStart.

Okay.

airportyh commented 4 years ago

New code is in! Please take a look.

airportyh commented 4 years ago

@tjvr @nathan any question on this? Let me know if there's anything I can do to help.

TheGrandmother commented 3 years ago

I would love to see this merged....

      if (err.stack.match(/.*moo\.js.*/)) {
        const m = err.message.match(/.*line (\d+) col (\d+)/)
        const newErr = new CompilerError(err.message)
        err.position = {line: parseInt(m[0]), col: parseInt(m[1])}
        throw newErr
      }

This is the sketchy workaround I had to use to differentiate lexer errors and get the position out. :P