Co-routines - Githubissues

nicowilliams commented 7 years ago

It'd be nice to be able to write:

def f(a; b; c):
    with_coroutine(a) as $a |
    with_coroutine(b) as $b |
    with_coroutine(c) as $c | 
    [co($a), co($b), co($c)];

f(range(3); range(3;6); range(6;9)) # -> [0, 3, 6] [1, 4, 7] [2, 5, 8]

or even better:

def f(@a; @b; @c):
    [@a, @b, @c];

f(range(3); range(3;6); range(6;9)) # -> [0, 3, 6] [1, 4, 7] [2, 5, 8]

This is somewhat inspired by Icon's co-routines. In Icon one can also pass new inputs to co-routines, and even refresh (restart) them, but for jq I think passing a new input to a co-routine would be the same as restarting it. Restarting a co-routine could be restart(@name), which will get whatever . is passed in as its new input.

nicowilliams commented 7 years ago

def f(@a; @b; @c):
    [@a//null, @b//null, @c//null]; # //null in case the different co-routines have different numbers of outputs

nicowilliams commented 7 years ago

And with varargs (see #1341):

def f(@args[]):
     range(args$[]) as $i | @args$[$i];

Ahh, we need a way determine the number of varargs arguments; here I used args$[].

nicowilliams commented 7 years ago

So, I really like the idea of @name as syntax for referring to things like co-routines, and, really, also open file handles!

Basically, a @name reference would be a lot like a def, but closing over internal state other than jvs and other defs. Because a @name would be like a function, it can take an input value (.), and will output zero, one, or more values.

A @name representing a file open for reading would ignore its input and output either the next input from the file, or all of them (depending on open-time options).

A @name representing a file open for writing would write its inputs and either output them too or output empty, depending on open-time options.

A @name representing a co-routine would ignore its input and output the next output of the co-routine, or empty if the co-routine is complete. A rewind @name would reset a co-routine (or file handle, where sensible) and feed it a new input.

This would mean there's no need to have jv-like file handles. And you could not store @names, only pass them around.

nicowilliams commented 7 years ago

Another thing, in Icon one can pass new values to co-routines, which values are then available via a special keyword. That would work for jq, though it'd be a bit weird since it would like the inputs builtin: non-deterministic, but we've already crossed that Rubicon.

OTOH, @name would not work well for full-duplex I/O, but we could use two handles, one for each direction.

We could even create threads to run co-routines in the background, and have a builtin that takes an arbitrary number of handles and returns the name/index of one that is ready for I/O, and this could be the basis for async I/O support in jq. Varargs would absolutely be a requirement here.

Ultimately, the nice thing about @name syntax is that it would make handles [to co-routines/threads, open files, pipes, sockets, databases, ...] lexically scoped, just like $name syntax, with handles closed automatically when their creation expressions are backtracked through, and with no internal details leaking out to the jq program.

nicowilliams commented 7 years ago

Actually, there should be a single operator / syntax for creating co-routines. Co-routines should access inputs passed on each invocation via input/inputs and should get a new input every time they are context-switched to. The input to the left-most filter in a co-routine should probably be null. There should be an operator for restarting a co-routine.

This will be most similar to... Icon!

nicowilliams commented 7 years ago

And there should be a flushinputs builtin too.

nicowilliams commented 7 years ago

Possible syntax:

def alternate(@a; @b): while (a; .,b);

alternate(range(5); range(4; -1; -1))

This allows a function to decide to make co-routines out of some of its arguments. The co-routines look like and are functions. When a function exits its frame, it cleans up the co-routines.

nicowilliams commented 7 years ago

One interesting thing will be handling tail calls: a tail call from a function frame that has co-routines cannot be made a proper tail call without first cleaning up the co-routines. The way I envision this is to have a stack of {jq_state instance pointer, jq stack address} where a co-routine has been allocated, and when doing tail calls this has to be checked, and either tail call disabled or co-routines cleaned up (by forcibly unwinding/backtracking to hit all the co-routine creation instructions).

There would have to be a new instruction for making a co-routine. It would create a jq_state with a copy of the parent but set to start at the right place. When backtracking through this instruction the jq_state would be cleaned up.

nicowilliams commented 7 years ago

An alternative syntax could be @<expr> as <name> |, and then we could make def f(@a; @b): ... work like it does for $formal_argument. I like this.

fadado commented 7 years ago

I really wish I could start testing the coroutines. I actually have and have studied the Icon book. It's obviously out of print, but is available to download! In fact I reread all the old Icon and SNOBOL books and articles in order to learn to program with jq ;-)

nicowilliams commented 7 years ago

@fadado :]

Yes, I have a soft spot for Icon. I do wish it had closures. I also wish it still compiled to C, and preferably C with GCC extensions like local functions and computed gotos. Examining the old Icon compiler output was a fun way to learn what continuation passing style (CPS) is and how it works.

Regarding co-routines, I guess an implementation plan would look like this:

finish the C-coded generators branch and make sure that tail calls cleanup C-coded generator states
add a slot in frames for a list of co-routines
add an opcode to create a co-routine which on backtrack/raising cleans up co-routines as with C-coded generators
add machinery for "cloning" jq_state instances, sharing bytecode
- this will need a way to start a co-routine in a specific code block other than the top-level
- this will need a way to change how input/inputs work in co-routines (access inputs provided by callers)
- add a flushinputs builtin -- hmmm, maybe a latestinput that discards earlier unconsumed inputs
add syntax to generate the new opcode
lastly, add simple I/O builtins as discussed, including a sandbox/soft-chroot option for the command-line

nicowilliams commented 7 years ago

The good news is that I am getting confident about both, the design and the syntax.

nicowilliams commented 7 years ago

Also, I want this as much as you, @fadado.

jqlang / jq

Co-routines #1342