munificent / craftinginterpreters

Repository for the book "Crafting Interpreters"
http://www.craftinginterpreters.com/
Other
8.84k stars 1.04k forks source link

Scanning numbers with more than one dot(s) does not produce a lexical error #103

Closed spraza closed 7 years ago

spraza commented 7 years ago

This is the code which scans numbers in your implementation of the scanner:

  private void number() {
    while (isDigit(peek())) advance();

    // Look for a fractional part.
    if (peek() == '.' && isDigit(peekNext())) {
      // Consume the "."
      advance();

      while (isDigit(peek())) advance();
    }

    addToken(NUMBER,
        Double.parseDouble(source.substring(start, current)));
  }

Looks like this code will not produce an error for inputs like 0.23.45:

> 0.23.45 0.23 . 45

Clearly, 0.23.45 isn't a valid number and I think there is an opportunity here to report an early lexical error along the lines of "Invalid Number: 0.23.45 at line xyz".

I maybe missing some lox language design detail here but do you think if the scanner can be enhanced to error out for such cases? I was thinking to add an additional check before adding the token that ensures the next character is a space. If not, error out. What do you think? I can test out the change and create a pull request if you think it's a good idea.

munificent commented 7 years ago

This doesn't produce a lexical error, but it does later produce a syntax error in the parser:

Error at '45': Expect property name after '.'.

The error message gives you a hint as to why this isn't a lexical error. Lox itself doesn't take advantage of this (because numbers are primitives and don't have methods), but other similar languages do allow method calls on numbers, even number literals. So while this is an error:

0.23.45

This might not be:

0.23.toString()

The other reason why I think it's better to handle this error in the parser and not the lexer is that this should also be an error:

0.23   .   45

But it would be awkward for the lexer to have to deal with the whitespace and treat that all as a single token. It's simpler to have the lexer recognize the tokens it can and then let the parser yell if the resulting sequence doesn't make sense.

Thanks for asking! This is indeed a curious corner of the lexer. (An even more curious one is the question of whether -123 should be one token or two. :) )

spraza commented 7 years ago

Thanks for the detail explanation! Both of your points totally make sense. The first one (methods on number literals) is something I wasn't too sure about with respect to Lox's design but thanks for clearing that up, too.

By the way, considering -123 as one token or two tokens has tickled my mind.. there goes my next few minutes/hours :smiley: