Up for discussion: Move the "main runtime" of es out of C and into es script

jpco commented 9 months ago

(I'm happy to let this change linger as a PR. I just wanted to share it with folks because I think it's really interesting. It's intended to be an invisible change for most users.)

So this started as a simple experiment asking "how much code, both in terms of LOC and 'interesting' behaviors, can you rip out of main.c?" It turns out that the answer is a LOT -- I was able to get main.c down to 50 lines, and those lines are nearly all just unconditional function calls. But the changes to enable that are a lot more interesting.

Essentially, what happened here is that the logical flow of es, from setup to REPL, has been inverted from a "C code that calls into es script for user customizability" model to "es script that calls into C code as its execution platform." To illustrate what I mean, look at the basic logical startup flow when executed as an interactive login shell.

In "normal" es, it looks something like this:

main()                                        (main.c)
 - early setup and argument parsing
 - argument-dependent setup
 - if login, call
   runesrc()
    - open ~/.esrc and call
      runfd()                                (input.c)
       - setup input
       - define %dispatch
       - pick %batch-loop as the REPL
       - with the input, call
         %batch-loop                      (initial.es)
 - pick runfd(stdin) based on args            (main.c)
 - call
   runfd()                                   (input.c)
    - setup input
    - define %dispatch
    - pick %interactive-loop as the REPL
    - with the input, call
      %interactive-loop                   (initial.es)

Most of of the interesting stuff happens in main.c and input.c, with just brief detours into hook functions.

In this experimental es, here's how it goes:

main()                                        (main.c)
 - early setup, listify argv
 - look up and call
   %main                                  (runtime.es)
    - argument parsing
    - argument-dependent setup
    - set $runflags, which calls
      set-runflags
       - define %dispatch
    - if login, call
      . ~/.esrc
       - set runflags, which calls
         set-runflags
          - define %dispatch
       - call
         %run-file
          - pick %batch-loop as the REPL
          - call
            $&runfile                     (prim-etc.c)
             - setup input
             - call the REPL picked earlier
               %batch-loop                (runtime.es)
    - based on the args, call
      %run-file
       - pick %interactive-loop as the REPL
       - call
         $&runfile                        (prim-etc.c)
          - setup input
          - call the REPL picked earlier
            %interactive-loop             (runtime.es)

Nearly everything in this flow has moved into the new runtime.es file, which is sourced by initial.es at dump time. All the argument parsing, startup logic, the invocation of the REPLs, the definition of %dispatch, are here. A person with a strong understanding of es semantics doesn't need to touch a single line of C to follow the path. They could even do something like make a new %login-loop REPL and invoke it when the shell is a login shell (by editing %run-file and %run-string), or change how %dispatch works (by editing set-runflags), or create a new .bashrc-style "on interactive startup" script (by editing es:main), all within runtime.es. This is, to me, very exciting -- talk about an extensible shell!

This is also at least somewhat related to the direction Paul Haahr wanted to go in, as excerpted from this mailing list post:

The internal function runfd() should probably be exposed [...]. When I was last actively working on es (over a year ago), I know that I was thinking about the read-eval-print loop and how to make it more flexible, based on comments from the list[...]

There are a couple other novelties here that I also find very interesting. The first is the $runflags variable. This variable is how the -eivxnlLGI flags are implemented (and could be how more flags are implemented as well). It corresponds somewhat with the $- variable in some other shells, but because of some of the strengths of es, instead of being formatted as a sneeze-like "himBHs", it reads like interactive login. So, for example, if a bit of code wants to know if it is running in an interactive context, it tests ~ $runflags interactive -- in fact, the %is-interactive function has been rewritten to just this test.

Something that differentiates $runflags, though, is that because it works via set-runflags and $&setrunflags, it can be changed in the middle of shell execution. This means a user can run runflags = $runflags echoinput, and the shell will start behaving as if it had been called with -v. A script could also, to protect critical sections, do this:

# ok-to-fail commands go here
local (runflags = $runflags exitonfalse) {
  # must-not-fail commands go here
}
# ok-to-fail commands go here

Combined with the exception-based exitonfalse behavior in #73, this seems like it could make for a powerful, flexible error-handling pattern.

One other thing that has been added for this change is dump-time-only primitives. The primitives defined in the new file prim-dump.c are only available in the esdump binary, not the es binary. I've just used this so far as a way to define a bootstrap $&batchloop that doesn't need to be in the es binary, and to inject some conditional compilation into initial.es. I suspect that it could be used for more, though.

All in all, this change is not really a shell-size win, measured either in LOC (though there are some fewer lines of C, and even more if you discount the lines that are exclusively used in esdump now), or in binary size (in general, to implement the same behavior in es and C, the former will require a bigger binary). But neither change is very large. There is presumably also a negative performance impact, but I haven't profiled it and, just as an interactive user, it isn't as large as one might expect.

Follow-up edit: Perhaps I've gone too far now, but now all the option parsing is out of the C code and moved to a new pure-es %getopt function. This provides consistent behavior between %main, %dot, access, and vars. This was enabled by simplifying $&access, the last user of opt.c, and reimplementing the complicated, option-parsing stuff in initial.es.

jpco commented 2 months ago

It turns out there's actually a bit of convergent evolution here: the plan 9 rc s (both plan9port and 9front, but not Rakitzis rc) invoke a script called rcmain to do part of startup, somewhat like the %main function does here. See the plan9port and 9front versions to compare.

Also, all versions of rc have a flag built-in which lets users query and (in some cases) modify the value of flags, giving functionality somewhat like runflags in this PR. I think using a special runflags variable with settors is more "es-ish", kind of like how es uses a signals variable and exceptions rather than rc's special sigfoo functions.

wryun commented 3 weeks ago

This looks really cool, and definitely in keeping with es's vibe (i.e. that since you have a capable language as part of the shell, the C code recedes into the background as much as possible). It's tempting me to relax my stance on 'es is in maintenance mode, so let's not do anything too crazy', though I do feel like people 'with a strong understanding of es semantics' might be a fairly small group compared to those who understand C.

jpco commented 3 weeks ago

It's tempting me to relax my stance on 'es is in maintenance mode, so let's not do anything too crazy'

I take that as quite the compliment :) FWIW, I'm obviously biased, but to my mind there's still a place for "crazy" changes to es, like the stuff Paul described in the mailing list as post-1.0. I don't think the "experiment" that es represents was ever really finished. But I don't have a strong enough understanding of github project management to say "let's shove it into this-or-that branch" or whatever.

people 'with a strong understanding of es semantics' might be a fairly small group compared to those who understand C.

Certainly the understatement of the last `{date +%s} seconds. However, I do think there's a non-negligible contingent of people who might be interested in es from the functional programming angle, who aren't necessarily C hackers. For those people, it's much nicer to be able to make a change just by learning the language they're already interested in than to also learn this dialect of semi-early-'90s C with a setjmp-based exception system and dynamic memory management done with a custom copying GC.

wryun / es-shell

Up for discussion: Move the "main runtime" of es out of C and into es script #79