The `startProcess` API is incomplete

zah commented 6 years ago

At the moment startProcess allows you to provide flags such as poParentStreams and poStdErrToStdOut to control the behaviour of standard input and output stream of the new process.

This limits the usage to two possible outcomes:

New pipes are created inside startProcess for all of the streams.
All streams are inherited.

There are no options for inheriting only some of the streams or passing in already created pipes to be used.

The API can be further enhanced by introducing the notion of process groups, which allows you to manage indirect sub-processes (e.g. when launching a script that will execute multiple sub-commands): https://www.boost.org/doc/libs/1_65_0/doc/html/boost_process/tutorial.html#boost_process.tutorial.group

yglukhov commented 6 years ago

Another useful consideration is async friendliness, that is, ensuring all relevant handles are accessible to be wrapped into async-flavoured api.

FedericoCeratto commented 6 years ago

Partially related: execCmd / execCmdEx also have a different API from startProcess and execProcess , e.g. they don't support env and a timeout.

timotheecour commented 5 years ago

/cc @zah as mentioned here (https://github.com/nim-lang/Nim/issues/9953#issuecomment-446951119) I like D's API for std.process : https://dlang.org/phobos/std_process.html, which is organized into low-level, mid-level, high-level; Nim doesn't have the lowest level one, and has a clunkier API that is less flexible (eg: we can't redirect to a File, we can't selectively redirect , etc) (and, well, has issues like https://github.com/nim-lang/Nim/issues/9953 and https://github.com/nim-lang/Nim/issues/9975)

I think we could take inspiration from that design, but with some modifcations to take advantage of Nim language. It's organized into (simplifying a bit):

# lowest level: just return Pid (containing process id, etc); take File inputs (defaulting to stdin/stdout/stderr), plus the rest
@trusted Pid spawnProcess(scope const(char[])[] args, File stdin = std.stdio.stdin, File stdout = std.stdio.stdout, File stderr = std.stdio.stderr, const string[string] env = null, Config config = Config.none, scope const char[] workDir = null); 

# middle level: return pipes
@safe ProcessPipes pipeProcess(scope const(char[])[] args, Redirect redirect = Redirect.all, const string[string] env = null, Config config = Config.none, scope const(char)[] workDir = null); 

# high level: return (exitCode, output)
@trusted auto execute(scope const(char[])[] args, const string[string] env = null, Config config = Config.none, size_t maxOutput = size_t.max, scope const(char)[] workDir = null);

in Nim this could be:

proc spawnProcess(cmd: string, args: seq[string] = @[], stdinFile = stdin, stdoutFile = stdout, stderrFile = stderr, env: Table[string,string] = nil, config =initConfig(), workDir = ""): Pid
proc pipeProcess(cmd: string, args: seq[string] = @[], redirect: Redirect,  env: Table[string,string] = nil, config =initConfig(), workDir = ""): Pipes
proc execute(cmd: string, args: seq[string] = @[], env: Table[string,string] = nil, config =initConfig(), workDir = ""): (exitCode: int, output: string, error: string)

timotheecour commented 3 years ago

if you've ever seen garbled cgen errors like this:

/Users/timothee/git_clone/nim/timn/build/nimcache/@mt11253.nim.c:94:2: error: use of /Users/timothee/git_clone/nim/timn/build/nimcache/@mt11253b.nim.cundeclared:87:2: error:  identifieruse  'Foo'of undeclared
 identifier 'Foo'
        Foo a;        Foo a;

        ^        ^

/Users/timothee/git_clone/nim/timn/build/nimcache/@mt11253b.nim.c:89:22: error: use of undeclared identifier 'a'
        nimZeroMem((void*)(&a), sizeof(Foo));

it's related to this.

the problem is that poParentStreams is not so good when dealing with multiple child processes (execProcesses), as their stdout/stderr get interleaved (especially when partial lines are written), causing hard to interpret logs.

the fix is to implement proper stream capture, and a user supplied callback that handles what to do with each captured line/piece of stdout/stderr stream. The way to do that correctly is the same way as to handle separate stdout/stderr streams: via the selectors API (which uses platform specific API's eg kqueue on OSX); each time a pipe gets data it calls the user supplied callback with that data. That way user can also attribute to which stream/child the data came from. It also doesn't block, nor cause any memory hog since it's up to user callback to handle.

minimized example

# main.nim:
when true:
  import osproc, strformat, random, os
  proc main()=
    let file = currentSourcePath.parentDir / "t11254b.nim"
    let outFile="/tmp/z01"
    const nim = getCurrentCompilerExe()
    doAssert execShellCmd(fmt"{nim} c -o:{outFile} {file}")==0
    var cmds: seq[string]
    let n = 3
    for i in 0..<n:
      cmds.add outFile
    let res = execProcesses(cmds, {poUsePath, poParentStreams}) # garbled
    # let res = execProcesses(cmds, {poUsePath, poParentStreams}, n=1) # ok
    doAssert res == 0
  main()

# t11254b.nim:
when true:
  proc main=
    for i in 0..<3:
      stderr.write "{"
      for j in 0..<10:
        stderr.write $j
      stderr.write "}\n"
  main()

output with n=1:

{0123456789}
{0123456789}
{0123456789}
{0123456789}
{0123456789}
{0123456789}
{0123456789}
{0123456789}
{0123456789}

output with n = countProcessors():

{{{000111222333444555666777888999}
}
}
{{{00011122233344455566677788899}
9}
{}
{0{010121232343454565676787898}
99}
}

Araq commented 3 years ago

Well I'm still waiting for your Nimble package that offers an osproc alternative. (The rule should still be Nimble first, fusion second.)

disruptek commented 3 years ago

@leorize ?

nim-lang / Nim

The `startProcess` API is incomplete #7999

minimized example