Closed haxscramper closed 2 years ago
type
ShellResult* = object
cmd*: ShellCmd
cwd*: AbsoluteDir ## Absolute path of initial command execution directory
retcode*: int ## Exit code
stderr*: string ## Stderr for command
stdout*: string ## Stdout for command
ShellCmdPartKind* = enum
cpkArgument ## String argument to command
cpkOption ## Key-value pair
cpkFlag ## Boolean flag (on/off)
cpkRaw ## Raw parameter
cpkTemplated ## Interpolated parameter
ShellCmdPart* = object
prefix*: string
key*: string
case kind*: ShellCmdPartKind
of cpkOption:
val*: string
sep*: string
else:
discard
ShellCmd* = object
bin: string
opts: seq[ShellCmdPart]
func flag*(cmd: var ShellCmd, fl: string, prefix: string = "-") =
## Add flag for command
cmd.opts.add ShellCmdPart(kind: cpkFlag, key: fl)
func opt*(cmd: var ShellCmd, key, sep, val: string, prefix: string = "--") =
cmd.opts.add ShellCmdPart(
kind: cpkOption, key: key, val: val, sep: sep, prefix: prefix)
func opt*(cmd: var ShellCmd, opts: openarray[tuple[key, val: string]], prefix: string = "--") =
## Add sequence of key-value pairs
for (key, val) in opts:
cmd.opt(key, "=", val, prefix)
func raw*(cmd: var ShellCmd, str: string) =
## Add raw string for command (for things like `+2` that are not
## covered by default options)
cmd.opts.add ShellCmdpart(kind: cpkRaw, key: str)
func arg*(cmd: var ShellCmd, arg: string) =
## Add argument for command
cmd.opts.add ShellCmdPart(kind: cpkArgument, key: arg)
func arg*(cmd: var ShellCmd, arg: int) = cmd.arg($arg)
func add*(cmd: var ShellCmd, part: ShellCmdPart) =
cmd.opts.add part
func shell*(bin: string): ShellCmd = ShellCmd(bin: bin)
func toStr*(part: ShellCmdPart): string =
result = part.prefix & part.key
case part.kind:
of cpkOption:
result.add part.sep
result.add part.val
else:
discard
func interpolate(parts: ShellCmdPart, interpolations: Table[string, seq[ShellCmdPart]]): seq[ShellCmdPart] =
for part in parts:
if part.kind == cpkTemplated:
result.add interpolations[part.key]
else:
result.add part
func toStr*(cmd: ShellCmd, interpolations: Table[string, seq[ShellCmdPart]]): seq[string] =
result = @[cmd.bin]
for part in cmd.opts.interpolate(interpolations):
result.add part.toStr()
This is an example API - it is pretty simple and provides only minimally necessary abstractions over the seq[string]
, but in that case overly complicated implementation is not specifically needed here.
func opt*(cmd: var ShellCmd, opts: openarray[tuple[key, val: string]], prefix: string = "--") =
## Add sequence of key-value pairs
for (key, val) in opts:
cmd.opt(key, "=", val, prefix)
I don't think this works. Even the C compiler, heavily used by nim has options like these:
-I/usr/local/include
-std=c++11
-o outfile
("gcc" "-o" "outfile")
-Wl,--defsym,__stack_limit=0x7ffe0000
(nested arguments)The first two would work, but -o
explicitly asks to split the key and the value into different arguments. Putting a space as the separator would not work becaues it would still be a single argument. The last one to pass arguments through the gcc
interface to the linker.
In the end I think I might be happier to not use abstractions.
I think the abstraction would be per shell escaped or potentially quoted string.
@saem I don't think the shell should be involved at all here. You can start processes, pass arguments, set environment variables all without using the shell. The shell is just syntax to do it on the command line.
https://man7.org/linux/man-pages/man3/exec.3.html
The abstraction provided by the operating system to start a process is a list of arguments, and a list of environment variables. Then there is stdin
stdout
and stderr
for input output streams and last but not least the return status code (single byte).
Abstractions like flags, options, named parameters, nested arguments (-Wl
for gcc) are all application specific, argument group seperators (e.g. --
for git) are all application specific abstractions. I prefer not to have them mixed up with universal operating system level parameter abstractions.
Maybe the better solution would be to dumb down abstraction even further, making it a seq[string]
of arguments, with arg()
procedure. Alternatively, we can make flag()
a one-dash variant, opt()
a two-dash variant, so the following code works:
-I/usr/local/include
cmd.flag("I", "/usr/local/include")
-std=c++11
cmd.flag("std", "=", "c++11")
-o outfile
("gcc" "-o" "outfile")
cmd.flag("o")
cmd.arg("outfile")
-Wl,--defsym,__stack_limit=0x7ffe0000
(nested arguments)
cmd.flag("W", "l,--defsym,__stack_limit=0x7ffe0000")
In the end, flag()
and opt()
can be a trivial helper function that does the string join for you. I just don't want to see the raw string interpolations mixed with quoteShell
here and there thrown in at random.
Also, I forgot about /
that is used as prefix in windows, of course it has to be different from everything else
I prefer not to have them mixed up with universal operating system level parameter abstractions.
int execl(const char *pathname, const char *arg, ...
/*, (char *) NULL */);
int execlp(const char *file, const char *arg, ...
/*, (char *) NULL */);
int execle(const char *pathname, const char *arg, ...
/*, (char *) NULL, char *const envp[] */);
int execv(const char *pathname, char *const argv[]);
int execvp(const char *file, char *const argv[]);
int execvpe(const char *file, char *const argv[], char *const envp[]);
In that case seq[string]
with minor helpers on top that would allow simplifying common tasks (like CLI command interpolation or construction) would be enough.
if linkTmpl.len == 0:
linkTmpl = CC[conf.cCompiler].linkTmpl
result.addf(linkTmpl, ["builddll", builddll,
"mapfile", mapfile,
"buildgui", buildgui, "options", linkOptions,
"objfiles", objfiles, "exefile", exefile,
"nim", quoteShell(getPrefixDir(conf)),
"lib", quoteShell(conf.libpath),
"vccplatform", vccplatform(conf)])
basically something better than abominations like these
linkTmpl: "$buildgui $builddll -Wl,-Map,$mapfile -o $exefile $objfiles $options",
compileTmpl: "-w -MMD -MP -MF $dfile -c $options $include -o $objfile $file",
linkTmpl: "$builddll$vccplatform /Fe$exefile $objfiles $buildgui /nologo $options",
I have no general problem with a specified command templating syntax, like it is done with $something
in the example above. But I do have a problem with it, when it isn't documented and specified pricesely how that string it treated.
$exefile
be a single argument?$options
be a single argument?$doesnotexist
?$options
be an empty argument, or no argument at all?(Please no "smart" solution)
If text is a single parameter, it stays as a single parameter and passed as a single parameter to the os process execution. gcc $options
were previously abused to interpolate multiple arguments then it just means interpolation needs to operate on multiple arguments, so $XXX
is replaced with seq[ShellCmdPart]
, just like I provided in my example.
I think there are two major parts to this:
The abstraction for the former needs to allow for composition with the abstraction for the latter.
All of the comments above are pointing in the same direction it seems. So at this point is just implementation and iterating in it including docs and tests.
This can be added and used in new stuff, with old implementation getting replaced in small pieces as we progress through.
@saem
This can be added and used in new stuff, with old implementation getting replaced in small pieces as we progress through.
@haxscramper The thing I would like to point out is, as much as I approve the idea that you want to clean the things up a bit, I do not like the direction this thing is heading. The point is the following:
list of strings
everywhere, no matter what language I am usinglist of strings
land again.Here is an example. The manual of gcc tells me to pass an argument of the form "-I/usr/local/include". So my goal is to pass that string to gcc. I don't care about a flag
abstraction, I want that string to be in the argument list, and the most readable form to do it, is with cmd.arg("-I/usr/local/include")
. This isn't just true for gcc, the manuals of all command line programs aren't written with these abstractions here in mind.
This here is cryptic to read
cmd.flag("I", "/usr/local/include")
cmd.flag("std", "=", "c++11")
cmd.flag("o")
cmd.arg("outfile")
cmd.flag("W", "l,--defsym,__stack_limit=0x7ffe0000")
arg
should be obvious, but why is there arg
and raw
, is there a difference?So whatever abstraction you provide, I would like to pass my command parameters as a list of string and nothing else.
cmd.args(["-I/usr/local/include", "-std=c++11", "-o", "outfile", "-Wl,--defsym,__stack_limit=0x7ffe0000"])
or maybe even like this
cmd.args(splitWhitespace("-I/usr/local/include -std=c++11 -o outfile -Wl,--defsym,__stack_limit=0x7ffe0000"))
When I know gcc, I know instantly what it does. When I don't know gcc, but I do know about starting processes, I can still look up with "-I" and "-o" does in gcc
.
I don't want to be the fun killer, but it feels like I have to be it here. I do not want abstractions. Especially I do not want abstractions for the sake of implementing abstractions. If abstractions are implemented, they need to provide real benefits, either syntactically, or structurally to prevent mistakes in the future. I do not see this here as given.
list of strings as the only abstraction
Given we don't even have this (and instead just have a string
Ok, we can just make it a binary+seq[arg] where arg is a string. Just like execve arguments
cmd.args(splitWhitespace("-I/usr/local/include
No, if you have a list of strings why would you need to joint it in the literal and then split on the whitespace?
Xkcd picture with standards
Yeah sure, why even refactor anything, just leave the garbage alone
cmd.args(["-I/usr/local/include
Not the worst thing in the world, we can live with this
No, if you have a list of strings why would you need to joint it in the literal and then split on the whitespace?
The example is still accepting a list of strings. I've just put it in there as an example of how to enable a command line feel without being cryptic about what is going on in the backend. I shows how a list of strings
API could be used , not how it should be used.
Not the worst thing in the world, we can live with this
yay
PS: I just want to point out that I do agree with you that the code examples you picked in the biginning are very bad and need to be addressed
does not escape potential spaces and other characters in src
. Therefore the value of src
is not preserved but interpreted.
Should probably return a list of strings.
string literals like " --version"
to append "--version"
(without space at the beginning) as an argument are just ... I have no words for it.
import std/[strutils, tables]
type
ShellResult* = object
cmd*: ShellCmd
cwd*: AbsoluteDir ## Absolute path of initial command execution directory
retcode*: int ## Exit code
stderr*: string ## Stderr for command
stdout*: string ## Stdout for command
ShellArgKind* = enum
cpkArgument ## String argument to command
cpkTemplated ## Interpolated parameter
ShellArg* = object
cmd*: string
kind*: ShellArgKind
ShellCmd* = object
bin*: string
opts*: seq[ShellArg]
func shArg*(arg: string): ShellArg = ShellArg(kind: cpkArgument, cmd: arg)
func shSub*(arg: string): ShellArg = ShellArg(kind: cpkTemplated, cmd: arg)
func shSub*(cmd: var ShellCmd, subs: openarray[string]) =
for sub in subs:
cmd.opts.add shSub(sub)
func args*(cmd: var ShellCmd, args: openarray[string]) =
## Add argument for command
for arg in args:
cmd.opts.add shArg(arg)
func arg*(cmd: var ShellCmd, format: string, args: varargs[string]) =
cmd.args([format(format, args)])
func shell*(bin: string, args: openarray[string] = @[]): ShellCmd =
result = ShellCmd(bin: bin)
result.args args
func add*(cmd: var ShellCmd, arg: ShellArg) = cmd.opts.add arg
func add*(cmd: var ShellCmd, args: openarray[ShellArg]) = cmd.opts.add args
func toStr*(part: ShellArg): string =
part.cmd
type
ShInterpolate* = Table[string, seq[ShellArg]]
func interpolate(
part: ShellArg,
interpolations: ShInterpolate = default(ShInterpolate)
): seq[ShellArg] =
if part.kind == cpkTemplated:
if part.cmd in interpolations:
result.add interpolations[part.cmd]
else:
result.add part
func argsToStr*(
cmd: ShellCmd,
interpolations: ShInterpolate = default(ShInterpolate)
): seq[string] =
for part in cmd.opts:
for interpol in part.interpolate(interpolations):
result.add interpol.toStr()
func toStr*(
cmd: ShellCmd,
interpolations: ShInterpolate = default(ShInterpolate)
): seq[string] =
result = @[cmd.bin] & cmd.argsToStr(interpolations)
block:
var gcc = shell("gcc")
for path in ["/usr/local/include", "/usr/include"]:
gcc.arg("-I$#", path)
gcc.args([
"-std=c++11",
"-o",
"outfile",
"-Wl,--defsym,__stack_limit=0x7ffe0000"
])
echo gcc.toStr()
type
ConfigRef = object
linkOptions: seq[ShellArg]
linkOptionsCmd: seq[ShellArg]
proc getLinkOptions(conf: ConfigRef): seq[ShellArg] =
result = conf.linkOptions & conf.linkOptionsCmd
var conf: ConfigRef
conf.linkOptions.add shArg("--random")
gcc.add conf.getLinkOptions()
echo gcc.toStr()
And something like this for running command
import std/[osproc, streams, os]
proc exec*(
cmd: ShellCmd,
workdir: string = "",
stdin: string = ""
): ShellResult =
let workdir = if len(workdir) == 0: getCurrentDir() else: workdir
result.cwd = AbsoluteDir(workdir)
var p = startProcess(
cmd.bin,
args = cmd.argsToStr(),
options = {poUsePath}
)
result.cmd = cmd
var outp = outputStream(p)
var outerr = errorStream(p)
if stdin.len > 0:
inputStream(p).write(stdin)
close inputStream(p)
result.retcode = -1
var line = newStringOfCap(120)
while true:
if outp.readLine(line):
result.stdout.add(line)
result.stdout.add("\n")
elif outerr.readLine(line):
result.stderr.add(line)
result.stderr.add("\n")
else:
result.retcode = peekExitCode(p)
if result.retcode != -1:
break
close(p)
block:
var gcc = shell("gcc", ["--version 123"])
echo gcc.exec()
(cmd: (bin: "gcc", opts: @[(cmd: "--version 123", kind: cpkArgument)]), cwd: "/tmp", retcode: 1, stderr: "gcc: error: unrecognized command-line option ‘--version 123’; did you mean ‘--version’?\ngcc: fatal error: no input files\ncompilation terminated.\n", stdout: "")
echo shell("nim", ["--eval=echo 12 + 2"]).exec()
(cmd: (bin: "nim", opts: @[(cmd: "--eval=echo 12 + 2", kind: cpkArgument)]), cwd: "/tmp", retcode: 0, stderr: "", stdout: "14\n")
For parallel execution (hacked on top of osproc, just to provide an example).
import std/posix
iterator exec*[T](
cmds: openarray[tuple[cmd: ShellCmd, data: T]],
fullParams: set[ProcessOption] = {poUsePath},
workdir: string = "",
maxPool: int = 8,
beforeStart: proc(cmd: ShellCmd, data: T) = nil
): tuple[res: ShellResult, data: T] =
assert 0 < maxPool
var processList: seq[Process] = newSeq[Process](maxPool)
var commandMap: seq[int] = newSeq[int](maxPool)
let maxPool = min(maxPool, cmds.len)
var leftCount = len(cmds)
var cmdIdx = 0
let workdir = if len(workdir) == 0: getCurrentDir() else: workdir
while cmdIdx < maxPool:
if not isNil(beforeStart):
beforeStart(cmds[cmdIdx].cmd, cmds[cmdIdx].data)
processList[cmdIdx] = start(
cmds[cmdIdx].cmd,
workdir = workdir,
options = fullParams
)
commandMap[cmdIdx] = cmdIdx
inc(cmdIdx)
while leftCount > 0:
var exitedIdx = -1
var status: cint = 1
let res: Pid = waitpid(-1, status, 0)
if res > 0:
for idx, process in mpairs(processList):
if not isNil(process) and process.processID() == res:
if WIFEXITED(status) or WIFSIGNALED(status):
# process.exitFlag = true
# process.exitStatus = status
exitedIdx = idx
break
else:
let err = osLastError()
if err == OSErrorCode(ECHILD):
# some child exits, we need to check our childs exit codes
for idx, process in mpairs(processList):
if (not isNil(process)) and (not running(process)):
# process.exitFlag = true
# process.exitStatus = status
exitedIdx = idx
break
elif err == OSErrorCode(EINTR):
# signal interrupted our syscall, lets repeat it
continue
else:
# all other errors are exceptions
raiseOSError(err)
if exitedIdx >= 0:
var res: ShellResult
let p = processList[exitedIdx]
res.cmd = cmds[commandMap[exitedIdx]].cmd
res.stdout = p.outputStream.readAll()
res.stderr = p.errorStream.readAll()
res.retcode = p.peekExitCode()
yield (res, cmds[commandMap[exitedIdx]].data)
close(processList[exitedIdx])
if cmdIdx < len(cmds):
if not isNil(beforeStart):
beforeStart(cmds[cmdIdx].cmd, cmds[cmdIdx].data)
processList[exitedIdx] = start(
cmds[cmdIdx].cmd,
workdir = workdir,
options = fullParams)
commandMap[exitedIdx] = cmdIdx
inc(cmdIdx)
else:
processList[exitedIdx] = nil
dec(leftCount)
for (res, data) in exec({
shell("echo", ["AA"]): "long",
shell("echo", ["BB"]): "all"
}):
echo ">>> ", data
echo res
- I am very much already used to list of strings as the only abstraction for passing arguments to subprocesses.
- I have to use list of strings everywhere, no matter what language I am using
- Whatever abstraction you provide, I won't be able to fully embrace it, because as soon as I leave the Nim bubble, I am back to list of strings land again.
@krux02 I need to clarify something about philosophy here. Bluntly, these arguments are entirely invalid. We'll not use these to make decisions. The other parts of the comment with respect to type modelling issues, error prevention, and such, that is totally fine.
For testament and a few other cases, I'm not sure aobut pulling out the std error/output is a good idea as ShellResult
encourages. 🤔
This is porbably totally fine for extcomp and main. Testament might be ok until needs grow.
@saem
I need to clarify something about philosophy here. Bluntly, these arguments are entirely invalid. We'll not use these to make decisions. The other parts of the comment with respect to type modelling issues, error prevention, and such, that is totally fine.
Can you elabotae on this. What do you want to clarify about philosophy here? How are they entirely invalid? No problem with being blunt here, as I was blunt as well, so sorry for that.
Can you elabotae on this. What do you want to clarify about philosophy here? How are they entirely invalid? No problem with being blunt here, as I was blunt as well, so sorry for that.
Yeah, I didn't like how I phrased that, was playing with the wording any how, thanks for taking it charitably.
What I mean about the philosophy is that we have a genuine change to restart with nimskull, let's not simply get stuck with decisions that got stuck in other languages here. I hope that makes sense.
So in order to do that, let's avoid conventions established in other languages where those conventions might well existing because of the poor abstraction capabilities in those languages. So more specifically, lots of languages have strings, and yeah there are lots of issues with trying to create a good set of types over strings that model the CLI invocations.
But... let's not be defeatist, there are some pretty good type abstractions already available to us and let's not shoot ourselves in the foot because others have done the same before. I'm not going to say there zero benefits to following the herd, there are lots! But let's follow the herd for where things are an improvement and rework for where things are not.
In this case the herd is regressing to flinging strings around which can also have security implications, but I digress, so the herd is towards a regression. In other cases, let's saying testing library design, like OMG I would kill for something a lot more like JUnit, PHPUnit, RSpec, Mocha, KTest, ... you name it and it's probably better. 😆
That's a lot of words, sorry. I hope that helped; thanks for reading. I'm happy to talk more we can discuss in chat, a discussion, or whatever.
Most of the commands executed in the compiler go through the
execProcesses*(cmds: openArray[string],
orexecCmdEx*(command: string,
and as a consequence, most of the code handles strings by immediately joining them, using string template interpolation all over the place. While by itself it is not too bad, it makes it harder to reason about the code, because semi-random string joins all over the place, likehttps://github.com/nim-works/nimskull/blob/afbd5fe6fb4882d2094069b53a877cc1997f3a32/compiler/backend/extccomp.nim#L357-L359
https://github.com/nim-works/nimskull/blob/afbd5fe6fb4882d2094069b53a877cc1997f3a32/compiler/backend/extccomp.nim#L491-L492
https://github.com/nim-works/nimskull/blob/afbd5fe6fb4882d2094069b53a877cc1997f3a32/compiler/backend/extccomp.nim#L539
As well as weird handling of the shell execution results, where (again), premature string formatting completely conceals the purpose and data flow in the code.
https://github.com/nim-works/nimskull/blob/afbd5fe6fb4882d2094069b53a877cc1997f3a32/testament/testament.nim#L474
https://github.com/nim-works/nimskull/blob/afbd5fe6fb4882d2094069b53a877cc1997f3a32/testament/specs.nim#L120-L124
https://github.com/nim-works/nimskull/blob/afbd5fe6fb4882d2094069b53a877cc1997f3a32/testament/testament.nim#L179-L187
Instead, several helper types should be introduced to abstract away the command handling logic -
ShellCmd
,ShellResult
,ShellCmdPart
.https://github.com/nim-works/nimskull/blob/829ffa1fb6a557bd7e65485383273bafcbde2378/compiler/front/main.nim#L84-L93
https://github.com/nim-works/nimskull/blob/829ffa1fb6a557bd7e65485383273bafcbde2378/compiler/backend/extccomp.nim#L416-L418
https://github.com/nim-works/nimskull/blob/829ffa1fb6a557bd7e65485383273bafcbde2378/compiler/backend/extccomp.nim#L235-L240
https://github.com/nim-works/nimskull/blob/829ffa1fb6a557bd7e65485383273bafcbde2378/compiler/backend/extccomp.nim#L112