nim-works / nimskull

An in development statically typed systems programming language; with sustainability at its core. We, the community of users, maintain it.
https://nim-works.github.io/nimskull/index.html
Other
277 stars 39 forks source link

Move away from the `string` for handling executed commands #213

Closed haxscramper closed 2 years ago

haxscramper commented 2 years ago

Most of the commands executed in the compiler go through the execProcesses*(cmds: openArray[string], or execCmdEx*(command: string, and as a consequence, most of the code handles strings by immediately joining them, using string template interpolation all over the place. While by itself it is not too bad, it makes it harder to reason about the code, because semi-random string joins all over the place, like

https://github.com/nim-works/nimskull/blob/afbd5fe6fb4882d2094069b53a877cc1997f3a32/compiler/backend/extccomp.nim#L357-L359

https://github.com/nim-works/nimskull/blob/afbd5fe6fb4882d2094069b53a877cc1997f3a32/compiler/backend/extccomp.nim#L491-L492

https://github.com/nim-works/nimskull/blob/afbd5fe6fb4882d2094069b53a877cc1997f3a32/compiler/backend/extccomp.nim#L539

As well as weird handling of the shell execution results, where (again), premature string formatting completely conceals the purpose and data flow in the code.

https://github.com/nim-works/nimskull/blob/afbd5fe6fb4882d2094069b53a877cc1997f3a32/testament/testament.nim#L474

https://github.com/nim-works/nimskull/blob/afbd5fe6fb4882d2094069b53a877cc1997f3a32/testament/specs.nim#L120-L124

https://github.com/nim-works/nimskull/blob/afbd5fe6fb4882d2094069b53a877cc1997f3a32/testament/testament.nim#L179-L187

Instead, several helper types should be introduced to abstract away the command handling logic - ShellCmd, ShellResult, ShellCmdPart.

https://github.com/nim-works/nimskull/blob/829ffa1fb6a557bd7e65485383273bafcbde2378/compiler/front/main.nim#L84-L93

https://github.com/nim-works/nimskull/blob/829ffa1fb6a557bd7e65485383273bafcbde2378/compiler/backend/extccomp.nim#L416-L418

https://github.com/nim-works/nimskull/blob/829ffa1fb6a557bd7e65485383273bafcbde2378/compiler/backend/extccomp.nim#L235-L240

https://github.com/nim-works/nimskull/blob/829ffa1fb6a557bd7e65485383273bafcbde2378/compiler/backend/extccomp.nim#L112

haxscramper commented 2 years ago
type
  ShellResult* = object
    cmd*: ShellCmd
    cwd*: AbsoluteDir ## Absolute path of initial command execution directory
    retcode*: int ## Exit code
    stderr*: string ## Stderr for command
    stdout*: string ## Stdout for command

  ShellCmdPartKind* = enum
    cpkArgument ## String argument to command
    cpkOption ## Key-value pair
    cpkFlag ## Boolean flag (on/off)
    cpkRaw ## Raw parameter
    cpkTemplated ## Interpolated parameter

  ShellCmdPart* = object
    prefix*: string
    key*: string
    case kind*: ShellCmdPartKind
      of cpkOption:
        val*: string
        sep*: string

      else:
        discard

  ShellCmd* = object
    bin: string
    opts: seq[ShellCmdPart]

func flag*(cmd: var ShellCmd, fl: string, prefix: string = "-") =
  ## Add flag for command
  cmd.opts.add ShellCmdPart(kind: cpkFlag, key: fl)

func opt*(cmd: var ShellCmd, key, sep, val: string, prefix: string = "--") =
  cmd.opts.add ShellCmdPart(
    kind: cpkOption, key: key, val: val, sep: sep, prefix: prefix)

func opt*(cmd: var ShellCmd, opts: openarray[tuple[key, val: string]], prefix: string = "--") =
  ## Add sequence of key-value pairs
  for (key, val) in opts:
    cmd.opt(key, "=", val, prefix)

func raw*(cmd: var ShellCmd, str: string) =
  ## Add raw string for command (for things like `+2` that are not
  ## covered by default options)
  cmd.opts.add ShellCmdpart(kind: cpkRaw, key: str)

func arg*(cmd: var ShellCmd, arg: string) =
  ## Add argument for command
  cmd.opts.add ShellCmdPart(kind: cpkArgument, key: arg)

func arg*(cmd: var ShellCmd, arg: int) = cmd.arg($arg)

func add*(cmd: var ShellCmd, part: ShellCmdPart) =
  cmd.opts.add part

func shell*(bin: string): ShellCmd = ShellCmd(bin: bin)

func toStr*(part: ShellCmdPart): string =
  result = part.prefix & part.key
  case part.kind:
    of cpkOption:
      result.add part.sep
      result.add part.val

    else:
      discard

func interpolate(parts: ShellCmdPart, interpolations: Table[string, seq[ShellCmdPart]]): seq[ShellCmdPart] = 
  for part in parts:
    if part.kind == cpkTemplated:
      result.add interpolations[part.key]

    else:
      result.add part

func toStr*(cmd: ShellCmd, interpolations: Table[string, seq[ShellCmdPart]]): seq[string] =
  result = @[cmd.bin]
  for part in cmd.opts.interpolate(interpolations):
    result.add part.toStr()

This is an example API - it is pretty simple and provides only minimally necessary abstractions over the seq[string], but in that case overly complicated implementation is not specifically needed here.

krux02 commented 2 years ago
func opt*(cmd: var ShellCmd, opts: openarray[tuple[key, val: string]], prefix: string = "--") =
  ## Add sequence of key-value pairs
  for (key, val) in opts:
    cmd.opt(key, "=", val, prefix)

I don't think this works. Even the C compiler, heavily used by nim has options like these:

The first two would work, but -o explicitly asks to split the key and the value into different arguments. Putting a space as the separator would not work becaues it would still be a single argument. The last one to pass arguments through the gcc interface to the linker.

krux02 commented 2 years ago

In the end I think I might be happier to not use abstractions.

saem commented 2 years ago

I think the abstraction would be per shell escaped or potentially quoted string.

krux02 commented 2 years ago

@saem I don't think the shell should be involved at all here. You can start processes, pass arguments, set environment variables all without using the shell. The shell is just syntax to do it on the command line.

https://man7.org/linux/man-pages/man3/exec.3.html

The abstraction provided by the operating system to start a process is a list of arguments, and a list of environment variables. Then there is stdin stdout and stderr for input output streams and last but not least the return status code (single byte).

Abstractions like flags, options, named parameters, nested arguments (-Wl for gcc) are all application specific, argument group seperators (e.g. -- for git) are all application specific abstractions. I prefer not to have them mixed up with universal operating system level parameter abstractions.

haxscramper commented 2 years ago

Maybe the better solution would be to dumb down abstraction even further, making it a seq[string] of arguments, with arg() procedure. Alternatively, we can make flag() a one-dash variant, opt() a two-dash variant, so the following code works:

  • -I/usr/local/include
cmd.flag("I", "/usr/local/include")
  • -std=c++11
cmd.flag("std", "=", "c++11")
  • -o outfile ("gcc" "-o" "outfile")
cmd.flag("o")
cmd.arg("outfile")
  • -Wl,--defsym,__stack_limit=0x7ffe0000 (nested arguments)
cmd.flag("W", "l,--defsym,__stack_limit=0x7ffe0000")

In the end, flag() and opt() can be a trivial helper function that does the string join for you. I just don't want to see the raw string interpolations mixed with quoteShell here and there thrown in at random.

Also, I forgot about / that is used as prefix in windows, of course it has to be different from everything else

haxscramper commented 2 years ago

I prefer not to have them mixed up with universal operating system level parameter abstractions.

       int execl(const char *pathname, const char *arg, ...
                       /*, (char *) NULL */);
       int execlp(const char *file, const char *arg, ...
                       /*, (char *) NULL */);
       int execle(const char *pathname, const char *arg, ...
                       /*, (char *) NULL, char *const envp[] */);
       int execv(const char *pathname, char *const argv[]);
       int execvp(const char *file, char *const argv[]);
       int execvpe(const char *file, char *const argv[], char *const envp[]);

In that case seq[string] with minor helpers on top that would allow simplifying common tasks (like CLI command interpolation or construction) would be enough.

    if linkTmpl.len == 0:
      linkTmpl = CC[conf.cCompiler].linkTmpl
    result.addf(linkTmpl, ["builddll", builddll,
        "mapfile", mapfile,
        "buildgui", buildgui, "options", linkOptions,
        "objfiles", objfiles, "exefile", exefile,
        "nim", quoteShell(getPrefixDir(conf)),
        "lib", quoteShell(conf.libpath),
        "vccplatform", vccplatform(conf)])

basically something better than abominations like these

linkTmpl: "$buildgui $builddll -Wl,-Map,$mapfile -o $exefile $objfiles $options",
compileTmpl: "-w -MMD -MP -MF $dfile -c $options $include -o $objfile $file",
linkTmpl: "$builddll$vccplatform /Fe$exefile $objfiles $buildgui /nologo $options",
krux02 commented 2 years ago

I have no general problem with a specified command templating syntax, like it is done with $something in the example above. But I do have a problem with it, when it isn't documented and specified pricesely how that string it treated.

(Please no "smart" solution)

haxscramper commented 2 years ago

If text is a single parameter, it stays as a single parameter and passed as a single parameter to the os process execution. gcc $options were previously abused to interpolate multiple arguments then it just means interpolation needs to operate on multiple arguments, so $XXX is replaced with seq[ShellCmdPart], just like I provided in my example.

saem commented 2 years ago

I think there are two major parts to this:

The abstraction for the former needs to allow for composition with the abstraction for the latter.

saem commented 2 years ago

All of the comments above are pointing in the same direction it seems. So at this point is just implementation and iterating in it including docs and tests.

haxscramper commented 2 years ago

This can be added and used in new stuff, with old implementation getting replaced in small pieces as we progress through.

krux02 commented 2 years ago

@saem

This can be added and used in new stuff, with old implementation getting replaced in small pieces as we progress through.

Standards

@haxscramper The thing I would like to point out is, as much as I approve the idea that you want to clean the things up a bit, I do not like the direction this thing is heading. The point is the following:

Here is an example. The manual of gcc tells me to pass an argument of the form "-I/usr/local/include". So my goal is to pass that string to gcc. I don't care about a flag abstraction, I want that string to be in the argument list, and the most readable form to do it, is with cmd.arg("-I/usr/local/include"). This isn't just true for gcc, the manuals of all command line programs aren't written with these abstractions here in mind.

This here is cryptic to read

cmd.flag("I", "/usr/local/include")
cmd.flag("std", "=", "c++11")
cmd.flag("o")
cmd.arg("outfile")
cmd.flag("W", "l,--defsym,__stack_limit=0x7ffe0000")

So whatever abstraction you provide, I would like to pass my command parameters as a list of string and nothing else.

cmd.args(["-I/usr/local/include", "-std=c++11", "-o",  "outfile", "-Wl,--defsym,__stack_limit=0x7ffe0000"])

or maybe even like this

cmd.args(splitWhitespace("-I/usr/local/include -std=c++11 -o outfile -Wl,--defsym,__stack_limit=0x7ffe0000"))

When I know gcc, I know instantly what it does. When I don't know gcc, but I do know about starting processes, I can still look up with "-I" and "-o" does in gcc.

I don't want to be the fun killer, but it feels like I have to be it here. I do not want abstractions. Especially I do not want abstractions for the sake of implementing abstractions. If abstractions are implemented, they need to provide real benefits, either syntactically, or structurally to prevent mistakes in the future. I do not see this here as given.

haxscramper commented 2 years ago

list of strings as the only abstraction

Given we don't even have this (and instead just have a string

Ok, we can just make it a binary+seq[arg] where arg is a string. Just like execve arguments

cmd.args(splitWhitespace("-I/usr/local/include

No, if you have a list of strings why would you need to joint it in the literal and then split on the whitespace?

Xkcd picture with standards

Yeah sure, why even refactor anything, just leave the garbage alone

cmd.args(["-I/usr/local/include

Not the worst thing in the world, we can live with this

krux02 commented 2 years ago

No, if you have a list of strings why would you need to joint it in the literal and then split on the whitespace?

The example is still accepting a list of strings. I've just put it in there as an example of how to enable a command line feel without being cryptic about what is going on in the backend. I shows how a list of strings API could be used , not how it should be used.

Not the worst thing in the world, we can live with this

yay

PS: I just want to point out that I do agree with you that the code examples you picked in the biginning are very bad and need to be addressed

https://github.com/nim-works/nimskull/blob/afbd5fe6fb4882d2094069b53a877cc1997f3a32/compiler/backend/extccomp.nim#L357-L359

does not escape potential spaces and other characters in src. Therefore the value of src is not preserved but interpreted.

https://github.com/nim-works/nimskull/blob/afbd5fe6fb4882d2094069b53a877cc1997f3a32/compiler/backend/extccomp.nim#L491-L492

Should probably return a list of strings.

https://github.com/nim-works/nimskull/blob/afbd5fe6fb4882d2094069b53a877cc1997f3a32/compiler/backend/extccomp.nim#L539

string literals like " --version" to append "--version" (without space at the beginning) as an argument are just ... I have no words for it.

haxscramper commented 2 years ago
import std/[strutils, tables]

type
  ShellResult* = object
    cmd*: ShellCmd
    cwd*: AbsoluteDir ## Absolute path of initial command execution directory
    retcode*: int ## Exit code
    stderr*: string ## Stderr for command
    stdout*: string ## Stdout for command

  ShellArgKind* = enum
    cpkArgument ## String argument to command
    cpkTemplated ## Interpolated parameter

  ShellArg* = object
    cmd*: string
    kind*: ShellArgKind

  ShellCmd* = object
    bin*: string
    opts*: seq[ShellArg]

func shArg*(arg: string): ShellArg = ShellArg(kind: cpkArgument, cmd: arg)
func shSub*(arg: string): ShellArg = ShellArg(kind: cpkTemplated, cmd: arg)

func shSub*(cmd: var ShellCmd, subs: openarray[string]) =
  for sub in subs:
    cmd.opts.add shSub(sub)

func args*(cmd: var ShellCmd, args: openarray[string]) =
  ## Add argument for command
  for arg in args:
    cmd.opts.add shArg(arg)

func arg*(cmd: var ShellCmd, format: string, args: varargs[string]) =
  cmd.args([format(format, args)])

func shell*(bin: string, args: openarray[string] = @[]): ShellCmd =
  result = ShellCmd(bin: bin)
  result.args args

func add*(cmd: var ShellCmd, arg: ShellArg) = cmd.opts.add arg
func add*(cmd: var ShellCmd, args: openarray[ShellArg]) = cmd.opts.add args

func toStr*(part: ShellArg): string =
  part.cmd

type
  ShInterpolate* = Table[string, seq[ShellArg]]

func interpolate(
    part: ShellArg,
    interpolations: ShInterpolate = default(ShInterpolate)
  ): seq[ShellArg] =
  if part.kind == cpkTemplated:
    if part.cmd in interpolations:
      result.add interpolations[part.cmd]

  else:
    result.add part

func argsToStr*(
    cmd: ShellCmd,
    interpolations: ShInterpolate = default(ShInterpolate)
  ): seq[string] =
  for part in cmd.opts:
    for interpol in part.interpolate(interpolations):
      result.add interpol.toStr()

func toStr*(
    cmd: ShellCmd,
    interpolations: ShInterpolate = default(ShInterpolate)
  ): seq[string] =
  result = @[cmd.bin] & cmd.argsToStr(interpolations)

block:
  var gcc = shell("gcc")
  for path in ["/usr/local/include", "/usr/include"]:
    gcc.arg("-I$#", path)

  gcc.args([
    "-std=c++11",
    "-o",
    "outfile",
    "-Wl,--defsym,__stack_limit=0x7ffe0000"
  ])

  echo gcc.toStr()

  type
    ConfigRef = object
      linkOptions: seq[ShellArg]
      linkOptionsCmd: seq[ShellArg]

  proc getLinkOptions(conf: ConfigRef): seq[ShellArg] =
    result = conf.linkOptions & conf.linkOptionsCmd

  var conf: ConfigRef
  conf.linkOptions.add shArg("--random")

  gcc.add conf.getLinkOptions()
  echo gcc.toStr()
haxscramper commented 2 years ago

And something like this for running command

import std/[osproc, streams, os]

proc exec*(
    cmd: ShellCmd,
    workdir: string = "",
    stdin: string = ""
  ): ShellResult =
  let workdir = if len(workdir) == 0: getCurrentDir() else: workdir
  result.cwd = AbsoluteDir(workdir)
  var p = startProcess(
    cmd.bin,
    args = cmd.argsToStr(),
    options = {poUsePath}
  )

  result.cmd = cmd
  var outp = outputStream(p)
  var outerr = errorStream(p)

  if stdin.len > 0:
    inputStream(p).write(stdin)

  close inputStream(p)

  result.retcode = -1
  var line = newStringOfCap(120)
  while true:
    if outp.readLine(line):
      result.stdout.add(line)
      result.stdout.add("\n")

    elif outerr.readLine(line):
      result.stderr.add(line)
      result.stderr.add("\n")

    else:
      result.retcode = peekExitCode(p)
      if result.retcode != -1:
        break

  close(p)
block:
  var gcc = shell("gcc", ["--version 123"])
  echo gcc.exec()
(cmd: (bin: "gcc", opts: @[(cmd: "--version 123", kind: cpkArgument)]), cwd: "/tmp", retcode: 1, stderr: "gcc: error: unrecognized command-line option ‘--version 123’; did you mean ‘--version’?\ngcc: fatal error: no input files\ncompilation terminated.\n", stdout: "")
echo shell("nim", ["--eval=echo 12 + 2"]).exec()
(cmd: (bin: "nim", opts: @[(cmd: "--eval=echo 12 + 2", kind: cpkArgument)]), cwd: "/tmp", retcode: 0, stderr: "", stdout: "14\n")
haxscramper commented 2 years ago

For parallel execution (hacked on top of osproc, just to provide an example).

import std/posix

iterator exec*[T](
    cmds: openarray[tuple[cmd: ShellCmd, data: T]],
    fullParams: set[ProcessOption] = {poUsePath},
    workdir: string = "",
    maxPool: int = 8,
    beforeStart: proc(cmd: ShellCmd, data: T) = nil
  ): tuple[res: ShellResult, data: T] =

  assert 0 < maxPool

  var processList: seq[Process] = newSeq[Process](maxPool)
  var commandMap: seq[int] = newSeq[int](maxPool)

  let maxPool = min(maxPool, cmds.len)

  var leftCount = len(cmds)
  var cmdIdx = 0

  let workdir = if len(workdir) == 0: getCurrentDir() else: workdir

  while cmdIdx < maxPool:
    if not isNil(beforeStart):
      beforeStart(cmds[cmdIdx].cmd, cmds[cmdIdx].data)

    processList[cmdIdx] = start(
      cmds[cmdIdx].cmd,
      workdir = workdir,
      options = fullParams
    )
    commandMap[cmdIdx] = cmdIdx
    inc(cmdIdx)

  while leftCount > 0:
    var exitedIdx = -1
    var status: cint = 1
    let res: Pid = waitpid(-1, status, 0)
    if res > 0:
      for idx, process in mpairs(processList):
        if not isNil(process) and process.processID() == res:
          if WIFEXITED(status) or WIFSIGNALED(status):
            # process.exitFlag = true
            # process.exitStatus = status
            exitedIdx = idx
            break

    else:
      let err = osLastError()
      if err == OSErrorCode(ECHILD):
        # some child exits, we need to check our childs exit codes
        for idx, process in mpairs(processList):
          if (not isNil(process)) and (not running(process)):
            # process.exitFlag = true
            # process.exitStatus = status
            exitedIdx = idx
            break

      elif err == OSErrorCode(EINTR):
        # signal interrupted our syscall, lets repeat it
        continue

      else:
        # all other errors are exceptions
        raiseOSError(err)

    if exitedIdx >= 0:
      var res: ShellResult

      let p = processList[exitedIdx]

      res.cmd = cmds[commandMap[exitedIdx]].cmd
      res.stdout = p.outputStream.readAll()
      res.stderr = p.errorStream.readAll()
      res.retcode = p.peekExitCode()
      yield (res, cmds[commandMap[exitedIdx]].data)

      close(processList[exitedIdx])
      if cmdIdx < len(cmds):
        if not isNil(beforeStart):
          beforeStart(cmds[cmdIdx].cmd, cmds[cmdIdx].data)

        processList[exitedIdx] = start(
          cmds[cmdIdx].cmd,
          workdir = workdir,
          options = fullParams)

        commandMap[exitedIdx] = cmdIdx
        inc(cmdIdx)

      else:
        processList[exitedIdx] = nil

      dec(leftCount)

for (res, data) in exec({
  shell("echo", ["AA"]): "long",
  shell("echo", ["BB"]): "all"
}):
  echo ">>> ", data
  echo res
saem commented 2 years ago
  • I am very much already used to list of strings as the only abstraction for passing arguments to subprocesses.
  • I have to use list of strings everywhere, no matter what language I am using
  • Whatever abstraction you provide, I won't be able to fully embrace it, because as soon as I leave the Nim bubble, I am back to list of strings land again.

@krux02 I need to clarify something about philosophy here. Bluntly, these arguments are entirely invalid. We'll not use these to make decisions. The other parts of the comment with respect to type modelling issues, error prevention, and such, that is totally fine.

saem commented 2 years ago

For testament and a few other cases, I'm not sure aobut pulling out the std error/output is a good idea as ShellResult encourages. 🤔

This is porbably totally fine for extcomp and main. Testament might be ok until needs grow.

krux02 commented 2 years ago

@saem

I need to clarify something about philosophy here. Bluntly, these arguments are entirely invalid. We'll not use these to make decisions. The other parts of the comment with respect to type modelling issues, error prevention, and such, that is totally fine.

Can you elabotae on this. What do you want to clarify about philosophy here? How are they entirely invalid? No problem with being blunt here, as I was blunt as well, so sorry for that.

saem commented 2 years ago

Can you elabotae on this. What do you want to clarify about philosophy here? How are they entirely invalid? No problem with being blunt here, as I was blunt as well, so sorry for that.

Yeah, I didn't like how I phrased that, was playing with the wording any how, thanks for taking it charitably.

What I mean about the philosophy is that we have a genuine change to restart with nimskull, let's not simply get stuck with decisions that got stuck in other languages here. I hope that makes sense.

So in order to do that, let's avoid conventions established in other languages where those conventions might well existing because of the poor abstraction capabilities in those languages. So more specifically, lots of languages have strings, and yeah there are lots of issues with trying to create a good set of types over strings that model the CLI invocations.

But... let's not be defeatist, there are some pretty good type abstractions already available to us and let's not shoot ourselves in the foot because others have done the same before. I'm not going to say there zero benefits to following the herd, there are lots! But let's follow the herd for where things are an improvement and rework for where things are not.

In this case the herd is regressing to flinging strings around which can also have security implications, but I digress, so the herd is towards a regression. In other cases, let's saying testing library design, like OMG I would kill for something a lot more like JUnit, PHPUnit, RSpec, Mocha, KTest, ... you name it and it's probably better. 😆

That's a lot of words, sorry. I hope that helped; thanks for reading. I'm happy to talk more we can discuss in chat, a discussion, or whatever.