wren-cli should support reading from stdin

guenther-brunthaler commented 3 years ago

It is frequently useful to read a script from a pipe or ssh-connection rather than from a local file.

I would like to do something like this:

$ echo 'System.print("hello")' | wren-cli

or

$ echo 'System.print("hello")' | wren-cli -

or

$ echo 'System.print("hello")' | wren-cli /dev/stdin

but none of these work.

Another useful pattern is embedding in a shell script, which I do quite frequently with awk. For instance, it would be nice if one could do the following:

#! /bin/sh
...
wren-cli /dev/fd/5 < input.txt 5<< 'EOF'
... wren code goes here...
EOF
... shell script continues

Note that in this example both the script and the file to be processed from standard input are not passed directly to wren, but indirectly via shell redirections. In this example the code comes in via file descriptor # 5, allowing file descriptor # 0 (standard input) to be used normally for feeding input to the script.

Also note that this redirection / EOF handling stuff is all done by the shell - wren-cli does not have to care about it. It just needed to support opening the /dev/fd/5 as a stream. A simple fopen() would do the trick. But obviously wren-cli does additional checks which fail, because /dev/fd/5 is a pipe and not a regular file. In particular, one cannot fseek() within a pipe or get its size in advance. The only thing one can do is just reading from it until EOF occurs.

And all this is embedded within a shell script, which makes things easier because a single file suffices for the combined task tackled by the shell, wren-cli, and potentially even more languages like awk invoked in the same way from within the script.

Anyway, it would really be nice if wren allowed reading its script from a pipe and not just from a real file. Currently, the following happens:

$ echo 'System.print("hello")' | wren-cli /dev/stdin
Could not read file "/dev/stdin".

joshgoebel commented 3 years ago

Ruby, is the requirement for the full script to be passed in a single block of RAM hard baked into Wren? Would it be possible to read it in chunks and stream it into the parser or would that be so difficult that we should first look for other solutions here on the CLI side?

We could of course read it in chunks on the CLI side and then combine them to a single large string that we pass to Wren if need be...

What would be a reasonable upper limit on the memory to allow for this?
Could we just malloc that amount at once (since we'll free it shortly anyways) or do we need to malloc in smaller chunks until we determine the actual size of input from a pipe and then make a single correctly sized malloc?

It first glance if Wren can't take streamed data we just have to read the whole input stream into RAM to determine it's size (removing the seek operation).

joshgoebel commented 3 years ago

echo 'System.print("hello")' | wren-cli -

I just added this on my branch since it's easy to key off of the - as a flag to behave differently vs trying to figure it out from looking at the file metadata.

https://github.com/joshgoebel/wren-cli/commit/7d8409d3cc4572c201db471a68d18537f4b5c91a

guenther-brunthaler commented 3 years ago

Very nice!

This solves the problem for reading a script from standard input, and it should also work for embedding a wren script into a shell script.

However, another case still remains: How to embed a wren script in a shells script and still allow that script to read its data input from standard input? Standard input obviously cannot be used for both, reading the script itself and also the script's input.

In such a case, the embedded script would need to be read from a file descriptor other than # 0 (= standard input), so the script can read from # 0 itself (which is the standard input of the shell script) once it has been started.

For instance, here is an example how AWK can be embedded into a shell script in order to add a list of numbers fed into its standard input:

#! /bin/sh
exec awk -f /dev/fd/5 5<< EOF
{sum+= $0}
END {print sum}
EOF

This is better than just writing "#! /usr/bin/awk" in the shebang line, because the awk executable will be found anywhere in $PATH and does not need a hard-coded pathname location. (More than one AWK implementation might be installed on a system, and the user can set up $PATH so that the preferred variant will be found first).

guenther-brunthaler commented 3 years ago

Correction: I forgot to quote EOF correctly in my previous post. The exec line should read

exec awk -f /dev/fd/5 5<< 'EOF'

or the $0 would be substituted by the shell rather than be interpreted by AWK as intended.

joshgoebel commented 3 years ago

Already works with my branch, not sure if there are caveats.

#!/bin/sh
./bin/wren_cli /dev/fd/5 < input.txt 5<< 'EOF'
import "io" for Stdin
System.print("booger")
System.print(Stdin.readLine())
EOF

I think the trick here is (maybe?) that the heredoc is of a known length so it can be passed as a file not a stream (just a guess)? vs pipes which give you a FIFO stream and are a whole other ball of wax.

guenther-brunthaler commented 3 years ago

I think the trick here is (maybe?) that the heredoc is of a known length so it can be passed as a file not a stream (just a guess)? vs pipes which give you a FIFO stream

I think you are right. The following test program "lastbyte.c"

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>

int main() {
   if (fseek(stdin, -1, SEEK_END)) {
      ioerr:
      (void)fprintf(stderr, "I/O error # %d: %s!\n", errno, strerror(errno));
      return EXIT_FAILURE;
   }
   {
      int c;
      if ((c= getchar()) == EOF) goto ioerr;
      if (printf("Last byte == 0x%02x\n", c) < 0) goto ioerr;
   }
   if (fflush(0)) goto ioerr;
}

fails for reading directly from the terminal but succeeds when reading from a heredoc redirection.

I did not expect this, but obviously seeking is indeed supported by heredoc.

joshgoebel commented 3 years ago

If you wanted to go a bit further and double check what the file st_mode is for a heredoc that might be of interest: https://stackoverflow.com/questions/1312922/detect-if-stdin-is-a-terminal-or-pipe

So if we got to a point where we worked with stdin (via -) and heredoc does that resolve MOST uses cases? Someone who needed pipes could instead do:

Write file to disk
Pass file to CLI vs pipe

Perhaps there are even tools for this? (tee?)

Since the VM design (AFAIK) requires that we pass the ENTIRE source code to the VM for compilation true "streaming" isn't a possibility anyways...

guenther-brunthaler commented 3 years ago

So if we got to a point where we worked with stdin (via -) and heredoc does that resolve MOST uses cases?

I would say it is now as good as it can get under the restriction regarding the "entire source"-requirement mentioned.

Someone who needed pipes could instead do:
* Write file to disk

True, but this requires write permission which might not always be available.

But given the "entire source"-requirement, there is not much which can be done.

Also, thanks to the heredoc "seek"-capability discovered in the previous postings, all important cases should be covered now. I think we can live without incremental parsing (i. e. lifting the "entire source"-requirement).

joshgoebel commented 3 years ago

True, but this requires write permission which might not always be available.

Typically /tmp is at least available (for situations just like this)...

wren-lang / wren-cli

wren-cli should support reading from stdin #55