janet-lang / janet

A dynamic language and bytecode vm
https://janet-lang.org
MIT License
3.57k stars 229 forks source link

parse error raised (invalid escape sequence) before string is terminated #1166

Open iacore opened 1 year ago

iacore commented 1 year ago

On parse error, the REPL state is not reset.

terminal trace:

> janet
Janet 1.28.0-meson linux/x64/gcc - '(doc)' for help
repl:1:> (print "\b")
repl:1:10: parse error: invalid string escape sequence
repl:2:"> 
repl:2:0: parse error: unexpected end of source, " opened at line 1, column 11

Note the repl:2:">. It should be repl:2:>, without the ".

bakpakin commented 1 year ago

Not exactly what is happening. It is confusing, but here is what is going on:

  1. Parser sees " - and starts a string.
  2. Parser sees "\b" and recognizes that as an error.
  3. An error is thrown imediately, and printed to the screen. The parser state is correctly reset.
  4. Parsing resumes. The trailing " in the input buffer is interpreted as a new string.
  5. The next line is now parsed as the contents of this new string.

Now this is confusing but consistent with how other parser errors are handled.

iacore commented 1 year ago

Shouldn't REPL stop parsing on first parse error? like (parse "\"a\\b\"")

bakpakin commented 1 year ago

Shouldn't REPL stop parsing on first parse error? like (parse "\"a\\b\"")

No, since a user at a repl would want to continue after any errors to enter more data. More generally, with some (bad) code snippet like:

(def a 123) ) (+ 1 2 3)

(Note the extra close parentheses) - the repl will print both 123 and 6 with an error in between them.

iacore commented 1 year ago

Shouldn't REPL stop parsing on first parse error? like (parse "\"a\\b\"")

No, since a user at a repl would want to continue after any errors to enter more data. More generally, with some (bad) code snippet like:

(def a 123) ) (+ 1 2 3)

(Note the extra close parentheses) - the repl will print both 123 and 6 with an error in between them.

I mean, the REPL can print the error, and don't do anything until the user fixed the syntax error. For example, Python

> python
Python 3.11.3 (main, Apr  7 2023, 00:46:44) [GCC 12.2.1 20230201] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> print(1); printa)
  File "<stdin>", line 1
    print(1); printa)
                    ^
SyntaxError: unmatched ')'
>>> 
bakpakin commented 1 year ago

Lisp is in general different from Python - Python will read a whole line, parse it, and then evaluate it. If the line is part of a block (ends with some sort of continuation like a :), then it will keep reading lines until the block is completed and only then do more parsing, and evaluate the whole block.

Janet is not line based, and does not process your programs in chunks of lines; the parser (reader) goes character by character. When a form is produced, it is compiled and then evaluated. The subsequent forms are not touched.

I mean, the REPL can print the error, and don't do anything until the user fixed the syntax error. For example, Python

Perhaps, but also keep in mind the the repl is the same environment as running from a file. The reason the behavior is like this, were there a syntax error, the execution will keep going is to get as much diagnostics information as possible. For example, if there is an error at the beginning of a source file, you would also like to get as many subsequent errors in the output as possible rather than needing to rerun the program one time per error.

You can use Ctrl-Q at the repl to cancel the current form though.

bakpakin commented 1 year ago

So the confusing part of this is the the error is thrown before the string is completely read, which causes confusing parser state. On a bad string escape, I think more consistent behavior would be to wait until the closing quote is found before raising a parsing error.

iacore commented 1 year ago

You can use Ctrl-Q at the repl to cancel the current form though.

Thanks! this helps