Open zah opened 6 years ago
Another useful consideration is async friendliness, that is, ensuring all relevant handles are accessible to be wrapped into async-flavoured api.
Partially related: execCmd / execCmdEx also have a different API from startProcess and execProcess , e.g. they don't support env and a timeout.
/cc @zah as mentioned here (https://github.com/nim-lang/Nim/issues/9953#issuecomment-446951119) I like D's API for std.process : https://dlang.org/phobos/std_process.html, which is organized into low-level, mid-level, high-level; Nim doesn't have the lowest level one, and has a clunkier API that is less flexible (eg: we can't redirect to a File, we can't selectively redirect , etc) (and, well, has issues like https://github.com/nim-lang/Nim/issues/9953 and https://github.com/nim-lang/Nim/issues/9975)
I think we could take inspiration from that design, but with some modifcations to take advantage of Nim language. It's organized into (simplifying a bit):
# lowest level: just return Pid (containing process id, etc); take File inputs (defaulting to stdin/stdout/stderr), plus the rest
@trusted Pid spawnProcess(scope const(char[])[] args, File stdin = std.stdio.stdin, File stdout = std.stdio.stdout, File stderr = std.stdio.stderr, const string[string] env = null, Config config = Config.none, scope const char[] workDir = null);
# middle level: return pipes
@safe ProcessPipes pipeProcess(scope const(char[])[] args, Redirect redirect = Redirect.all, const string[string] env = null, Config config = Config.none, scope const(char)[] workDir = null);
# high level: return (exitCode, output)
@trusted auto execute(scope const(char[])[] args, const string[string] env = null, Config config = Config.none, size_t maxOutput = size_t.max, scope const(char)[] workDir = null);
in Nim this could be:
proc spawnProcess(cmd: string, args: seq[string] = @[], stdinFile = stdin, stdoutFile = stdout, stderrFile = stderr, env: Table[string,string] = nil, config =initConfig(), workDir = ""): Pid
proc pipeProcess(cmd: string, args: seq[string] = @[], redirect: Redirect, env: Table[string,string] = nil, config =initConfig(), workDir = ""): Pipes
proc execute(cmd: string, args: seq[string] = @[], env: Table[string,string] = nil, config =initConfig(), workDir = ""): (exitCode: int, output: string, error: string)
if you've ever seen garbled cgen errors like this:
/Users/timothee/git_clone/nim/timn/build/nimcache/@mt11253.nim.c:94:2: error: use of /Users/timothee/git_clone/nim/timn/build/nimcache/@mt11253b.nim.cundeclared:87:2: error: identifieruse 'Foo'of undeclared
identifier 'Foo'
Foo a; Foo a;
^ ^
/Users/timothee/git_clone/nim/timn/build/nimcache/@mt11253b.nim.c:89:22: error: use of undeclared identifier 'a'
nimZeroMem((void*)(&a), sizeof(Foo));
it's related to this.
the problem is that poParentStreams
is not so good when dealing with multiple child processes (execProcesses
), as their stdout/stderr get interleaved (especially when partial lines are written), causing hard to interpret logs.
the fix is to implement proper stream capture, and a user supplied callback that handles what to do with each captured line/piece of stdout/stderr stream. The way to do that correctly is the same way as to handle separate stdout/stderr streams: via the selectors API (which uses platform specific API's eg kqueue on OSX); each time a pipe gets data it calls the user supplied callback with that data. That way user can also attribute to which stream/child the data came from. It also doesn't block, nor cause any memory hog since it's up to user callback to handle.
# main.nim:
when true:
import osproc, strformat, random, os
proc main()=
let file = currentSourcePath.parentDir / "t11254b.nim"
let outFile="/tmp/z01"
const nim = getCurrentCompilerExe()
doAssert execShellCmd(fmt"{nim} c -o:{outFile} {file}")==0
var cmds: seq[string]
let n = 3
for i in 0..<n:
cmds.add outFile
let res = execProcesses(cmds, {poUsePath, poParentStreams}) # garbled
# let res = execProcesses(cmds, {poUsePath, poParentStreams}, n=1) # ok
doAssert res == 0
main()
# t11254b.nim:
when true:
proc main=
for i in 0..<3:
stderr.write "{"
for j in 0..<10:
stderr.write $j
stderr.write "}\n"
main()
output with n=1:
{0123456789}
{0123456789}
{0123456789}
{0123456789}
{0123456789}
{0123456789}
{0123456789}
{0123456789}
{0123456789}
output with n = countProcessors():
{{{000111222333444555666777888999}
}
}
{{{00011122233344455566677788899}
9}
{}
{0{010121232343454565676787898}
99}
}
Well I'm still waiting for your Nimble package that offers an osproc alternative. (The rule should still be Nimble first, fusion second.)
@leorize ?
At the moment
startProcess
allows you to provide flags such aspoParentStreams
andpoStdErrToStdOut
to control the behaviour of standard input and output stream of the new process.This limits the usage to two possible outcomes:
startProcess
for all of the streams.There are no options for inheriting only some of the streams or passing in already created pipes to be used.
The API can be further enhanced by introducing the notion of process groups, which allows you to manage indirect sub-processes (e.g. when launching a script that will execute multiple sub-commands): https://www.boost.org/doc/libs/1_65_0/doc/html/boost_process/tutorial.html#boost_process.tutorial.group