Provide an API to evaluate code in a subinterpreter

oils-for-unix / oils

Oils is our upgrade path from bash to a better language and runtime. It's also for Python and JavaScript users who avoid shell!

http://www.oilshell.org/

Other

2.84k stars 155 forks source link

Provide an API to evaluate code in a subinterpreter #704

Open andychu opened 4 years ago

andychu commented 4 years ago

And after that I'm still interested in #663, e.g. providing better APIs. I think one thing you mentioned could be
eval -g -- 'echo $x'   # evaluate in global scope, not local
?? It might make it easier for others to write a line editor.

ble.sh also restores and saves many other shell settings before and after the execution of user commands (including stty, set -exvu -o posix -o emacs -o vi, shopt -s nocasematach, IGNOREEOF, IFS, BASH_REMATCH, FUNCNEST, $?, $_, etc.). Maybe it is better to provide command execution in an independent environment but not in a global context shared with the line editor (like JavaScript object trees in different pages in a browser).

Also, I think we may consider the separation of the execution environment of the line editor and the user commands. ble.sh is completely implemented in a user space, so it provides complete flexibility allowing users to change any part of ble.sh by shell scripts. But this flexibility is a double-edged sword. It is possible to completely break ble.sh session by overwriting the shell functions of ble.sh. Also, ble.sh pollutes the shell variable namespace by defining a bunch of shell variables named _ble_*. Maybe it is useful to provide some mechanism to protect/separate several execution environments as well as a special way of defining shell variables/functions in an environment from another environment.

Note: But ble.sh is not implemented in that way, so the execution environment separation would be only useful for implementing entirely new line editors.

from #653

andychu commented 4 years ago

@akinomyoga This is something I've been thinking about, since Python is developing it.

https://lwn.net/Articles/754162/ -- Subinterpreter support for Python

https://www.python.org/dev/peps/pep-0554/

Use cases:

ble.sh
Evaluating configuration files. I want users to be able read an untrusted file like this:

# file.config

server {  # using Oil's block syntax
  name = www.example.com
  port = 80
  echo $(rm -rf / )    # NOT allowed in the subinterpreter
}

Maybe with:

eval -i name-of-interpreter -f file.config

Still up for discussion.

akinomyoga commented 4 years ago

Oh, I see. Thank you! I was actually wondering what is "multiple interpreters", but I got it now!

andychu commented 4 years ago

Actually there are two kinds of subinterpreters I can think of:

Running the same interpreter C++ code with a different state.Mem object and a different mutable_opts object. This is essentially what you're doing when saving and restoring state. That's this issue. And it's like Python's subinterpreter feature.
Using mycpp (the Python-to-C++) translator to create different interpreters with different C++ code.
- This would be useful for tracing which is performance sensitive. You might not want the code for the traced interpreter to slow down the non-traced interpreter.
- It might also be useful for having a state.Mem that records all the execution history. Or you could even have a sample of execution history.
- This is more like what PyPy does: http://doc.pypy.org/en/latest/config/

The first one is straightforward and almost done even, because of recent refactorings. The second one is a little more speculative, but could be useful. But the interpreter is designed to be very modular so things like this are possible.

It should also help with porting to Windows or WebAssembly (WASI). I just separated all the code that starts processes into core/executor.py. I renamed core/cmd_exec.py to core/cmd_eval.py, since it's a "pure" evaluator now with no I/O.

Anyway this is just saying that Oil can do stuff that other shells can't... It's far in the future but I think it will prove useful.

Another project to illustrate that is "deno". One of the regrets by the creator of node.js is that they didn't preserve the security properties of v8. Basically v8 already has "subinterpreters" but the way that node uses v8 throws that away.

So deno is another embedding of v8 that retains the sandbox. So Oil will have a sandbox, which is unlike any other shell. I think that will be useful for evaluating config files and as you point out writing a shell UI in "user space" -- you want to keep all the internal variables and functions separate from the user's program.

https://dev.to/nickytonline/10-things-i-regret-about-nodejs-14m3

andychu commented 4 years ago

Relevant details on debugger support from Rocky Bernstein:

https://oilshell.zulipchat.com/#narrow/stream/121540-oil-discuss/topic/A.20debugger.20for.20Python.20bytecode/near/199149519

There are somewhat hacky routines to save and restore interpreter state. See lib/save-restore.sh which is custom for zsh; there are analogs in bashdb and kshdb.

https://github.com/rocky/zshdb/blob/master/lib/save-restore.sh

Another problem related to this is that in debugging you may be in a subshell of the main program that you started out debugging in, and debugger settings are changed. For example inside the subshell you may want to set breakpoints, change the terminal width, or source-code style setting. In order propagate these changes to the parent shell, we write out a journal which is eval()'d. This is done in lib/journal.sh.

https://github.com/rocky/zshdb/blob/master/lib/journal.sh

andychu commented 4 years ago

Also mentioned on that thread:

https://www.manpagez.com/man/1/ksh/

   Name Spaces.
       Commands and functions that are executed as  part  of  the  list  of  a
       namespace  command  that  modify variables or create new ones, create a
       new variable whose name is the name of the name space as given by iden-
       tifier  preceded  by  ..   When a variable whose name is name is refer-
       enced, it is first searched for using .identifier.name.   Similarly,  a
       function  defined  by  a command in the namespace list is created using
       the name space name  preceded by a ..

       When  the list of a namespace command contains a namespace command, the
       names  of  variable and functions that are created consist of the vari-
       able or function name preceded by the list of identifiers each preceded
       by ..

       Outside  of a name space, a variable or function created inside a names
       space can be referenced by preceding it with the name space name.

       By default, variables staring with .sh are in the sh name space.

And

   Type Variables.
       Typed variables provide a way to create data structure and objects.   A
       type  can  be  defined either by a shared library, by the enum built-in
       command described below, or by using the new -T option of  the  typeset
       built-in command.  With the -T option of typeset, the type name, speci-
       fied as an option argument to -T,  is  set  with  a  compound  variable
       assignment  that  defines  the  type.   Function definitions can appear
       inside the compound variable assignment  and  these  become  discipline
       functions  for  this  type  and  can  be  invoked  or redefined by each
       instance of the type.  The function name create is  treated  specially.
       It  is invoked for each instance of the type that is created but is not
       inherited and cannot be redefined for each instance.

       When a type is defined a special  built-in  command  of  that  name  is
       added.   These  built-ins  are declaration commands and follow the same
       expansion rules as all the special built-in commands defined below that
       are  preceded  by  --.   These commands can subsequently be used inside
       further type definitions.  The man page for these commands can be  gen-
       erated  by  using  the  --man  option  or  any  of the other -- options
       described with getopts.  The -r, -a, -A, -h, and -S options of  typeset
       are permitted with each of these new built-ins.

andychu commented 4 years ago

Use case: handling signals in shell scripts, to emulate what GNU readline does?

We don't want to mess up the user signal state?

https://oilshell.zulipchat.com/#narrow/stream/121540-oil-discuss/topic/Biggest.20Shell.20Programs/near/205087545

In this case, if the user presses C-c, the entire processing will be killed, i.e., the line editor implemented in this way will stop or hang by the key press C-c. One might think we can set a trap handler trap ... INT or disable the signal by stty, but it affects the behavior of the user programs as well as the line editor itself, so I don't want to change the signal settings.

andychu commented 4 years ago

Very related:

Child interpreters in Tcl: https://www.tcl.tk/man/tcl8.5/tutorial/Tcl43.html

If the child is created with the -safe option, it will not be able to access the file system, or otherwise damage your system. This feature allows a script to evaluate code from an unknown (and untrusted) source.

# Set a variable "name" in each child interp, and
#  create a procedure within each interp 
#  to return that value
foreach int [list $i1 $i2] {
    interp eval $int [list set name $int]
    interp eval $int {proc nameis {} {global name; return "nameis: $name";} }
}