munificent / craftinginterpreters

Repository for the book "Crafting Interpreters"
http://www.craftinginterpreters.com/
Other
9.03k stars 1.05k forks source link

[Discussion] Accepting multi-line input in a REPL #799

Open aakshintala opened 4 years ago

aakshintala commented 4 years ago

The Lox REPL as described in the book (and as implemented in clox and jlox) doesn't support multi-line input. The code listings at the end of Chapter 8 and 9 all span multiple lines though, and attempting run them on clox/jlox/my implementation predictably results in parser errors.

I wondered what it would take to implement support for multi-line input in a REPL, and found that this is one of those classic trade-off style design questions. I think this would make for an interesting design aside in the book!

This problem fascinated me, so I spent the evening thinking up some solutions. Here are some simple designs I came up with: (I'm sure there are other more elegant solutions I haven't thought of...)

  1. Building a special multi-line input mode into the REPL. This quickly devolves into writing a small text editor (perhaps this is why there are so many editors?)
  2. Writing a simple brace/bracket matcher that determines if the input has an unbalanced brace or bracket. This could be implemented over the tokens returned by the scanner. When an unbalanced bracket is detected, the REPL keeps waiting for input unless it sees 2 newlines. The real issue with this approach is that we're essentially introducing parts of the language parsing logic into the REPL. Slippery slope. Probably worth it as a simple optimization in an industrial strength language interpreter, though?
  3. Throwing a special exception for unmatched brackets, double quotes, and missing semicolons (perhaps others?) from the parser that can be caught and handled by the REPL driver as a signal to capture multi-line input. The most efficient way of implementing this would be to pass each new line through the scanner only once. The generated tokens would then be appended to those from earlier lines and this entire set would pass through the parser each time until the parser is satisfied or complains about something else entirely. Again, two new lines in a row could also indicate end of input from the REPL user (useful in the case the user wants to error out). This approach preserves the separation of concerns, but results in the parser being invoked multiple times. Overall, given that human input is probably much slower than the parser's operation time (evidenced by the wide adoption of linters), this approach represents (IMO) the most reasonable trade-off for an industrial-strength interpreter.
nocturn9x commented 4 years ago

We could try and look at other implementations of multi line REPLs (see Python) and maybe implement something similar or come up with an even better solution, once I'm done with the bytecode VM I'll try implementing your second proposal, great issue!

BigZaphod commented 4 years ago

I haven't thought a lot about this yet, but I'd also like to hack in some basic multi-like support.

I think one possible way would be to make some changes to the compile() function so that it had a way of reporting back something along the lines of "is the scope back down to 0 or not" and also had a way to suppress errors.

That way, if you could call it while suppressing errors, and then the result is NULL with a final scope that is not 0, then just ignore those suppressed errors and perhaps print a different kind of prompt to indicate that you're in a "continue" mode to the user and let them keep going. Every time they hit enter, just compile their whole input string again (including what failed to compile last time). Whenever the compile() function says the scope is 0 and there's a valid function, then run it. If the scope is 0 but there's not a valid function, then perhaps just compile the input again without suppressing errors this time so the user can see what went wrong.

BigZaphod commented 4 years ago

I came up with a rough implementation that seems to be working okay. I'm working on a variant of clox at this point, so keep that in mind.

It's hard to post my code here because I've deviated from the book's version a fair bit by now. So I guess I'll just describe my basic approach in case it's useful?

First, I needed a way to stop the compiler and runtime errors from printing if the user had typed in a half-completed bit of code and hit enter.

Next, I needed a way to detect when the code that the user entered was at least somewhat likely to have been simply incomplete rather than a complete syntactic failure.

I ended up solving the second problem in what I think ended up being a pretty simple way. The Parser struct had been using a bool for panicMode. I changed that to an enum like this instead:

typedef enum {
    PARSER_OK,
    PARSER_EOF,
    PARSER_PANIC,
} PanicMode;

typedef struct {
    Token current;
    Token previous;
    bool hadError;
    PanicMode panicMode;
} Parser;

In the compiler's errorAt() function I made some modifications so it starts like this:

    if (parser.panicMode != PARSER_OK) {
        return;
    }

    // Make sure we remember that we've had an error.
    parser.panicMode = (token->type == TOKEN_EOF) ? PARSER_EOF : PARSER_PANIC;
    parser.hadError = true;

Everywhere that used to check for panicMode == true I just changed to check specifically for PARSER_PANIC instead.

The end result of all of this is that when the compiler finishes, if the very first error it hit was an EOF, then the panicMode will be left set as PARSER_EOF. I can then use that as a signal that the user's input was probably just incomplete.

So with that problem at least somewhat solved, I had to get back to the first problem of hiding the compiler's errors in this case while not losing them entirely in case the input was completed and I needed to show it.

To solve that, I replaced all of the hardcoded stderrs in the source to instead reference a different global FILE* that I named errorStream. In initVM() I set errorStream = stderr; at first so nothing changed initially.

In my repl() function in main.c, I used the open_memstream() function to make a write-only stream and then assigned it to errorStream. This way, everything that the compiler and VM used to print to stderr was now instead captured by this stream and written into a char * for later.

After the user inputs a string and fgets() returns, I immediately append the line to a new char* buffer that has to grow dynamically. Then I feed that whole buffer into interpret(). If the interpret() call results in a compiler error, I check to see if parserMode == PARSER_EOF and if it does I close and free my custom errorStream and then re-open it fresh. This gets rid of the errors without ever showing them to the user. Then the loop loops around and simply waits for another line.

In all other cases I print whatever errorStream had captured and then clear the input and error buffers and loop back around and start again with new fresh input.

So anyway, I know a bunch of code would probably be more clear - but that's the approach I'm using at the moment and it seems to work well enough for now!

munificent commented 4 years ago

Your solution is very similar to what I've done before too, @BigZaphod. Basically just try to parse the line and if the parser runs out, then assume it's a multi-line input, read another line, and try again. It feels a little hacky, but... REPLs tend to get a little hacky like this in my experience. :)

aakshintala commented 4 years ago

@munificent could you add a discussion label to this one as well?

Thanks for all the hard work! I just finished Part I, and I've not had this much fun in a long time.

mcfriend99 commented 4 years ago

What I did was simply scrap the repl from the C source and implemented it in my Lox deviate directly. Since I already support lists, vectors, generics, dictionary, exceptions and likes as well as imports. It was really easier to just move the implementation of the REPL to the new language.

degustaf commented 3 years ago

@mcfriend99 I would love to see the code where you did that. That sounds like such a cool way to implement the REPL

nocturn9x commented 3 years ago

if you're interested, my collaborator made a line editing library for the language we're working on (based on clox) here. It's handy because it doesn't require any modification to the parser whatsoever

nocturn9x commented 3 years ago

What I did was simply scrap the repl from the C source and implemented it in my Lox deviate directly. Since I already support lists, vectors, generics, dictionary, exceptions and likes as well as imports. It was really easier to just move the implementation of the REPL to the new language.

Do you mind sharing your implementation?

jido commented 2 years ago

I ran into that as soon as I added blocks at the end of chapter 8.

This is how I resolved it. In Lox.java:

  private static void runPrompt() throws IOException {
    InputStreamReader input = new InputStreamReader(System.in);
    BufferedReader reader = new BufferedReader(input);
    StringBuffer program = new StringBuffer();

    for (;;) { 
      System.out.print(program.length() == 0 ? "lox> " : " ... ");
      String line = reader.readLine();
      if (line == null) break;
      if (line.equals("")) {
        run(program.toString());
        program = new StringBuffer();
        hadError = false;
      }
      else {
        program.append(line).append('\n');
      }
    }
    run(program.toString());
  }

It simply waits for an empty line before it runs the program. Usually you would not need empty lines in a REPL, but it's a simple solution that works well.