Allow specifying shell/interpreter

Is your feature request related to a problem? Please describe.

Using the local default shell for Tusk files hurt portability. It is especially problematic for users who are not using a Bourne Compatible shell (e.g. users using fish) if the tusk file assumes a Bourne Shell compatible variant.

Describe the solution you'd like

To opt in for somewhat better portability, it should be possible to describe the shell interpreter to use. It should be possible to specify the interpreter for the task level, or to set a default for the full tusk file (used by all tasks that don't override it).

interpreter: /usr/bin/sh # override default

tasks:
    foo:
        interpreter: /usr/bin/bash # override default

The per task interpreter definition is especially important when using "include" for tasks.

The specified interpreter must be used not only for run members, but for any commands run as part of setting defaults for options and arguments as well.

Additional context

In an environment where som users are using e.g. "fish", some "bash" and some "zsh" as their default shell, being able to specify in tusk which shell interpreter to use would be extremely useful. This is not about offering any form of complete portability, but about being able to improve it.

Don't forget powershell... the only true cross platform shell environment.

I think this is an important feature, I'm a little uncertain how to make sure we get it right.

Currently, tusk will do the following:

If the env var SHELL is set, call $SHELL -c "${contents_of_script}"
Run sh -c "${contents_of_script}"

It's not really ideal—it's just what I happened to get working first that was good enough. The question I have is, how do we make this generic in a way that can be useful and stable long-term? The fewer times we have to break strict backward compatibility (i.e., any script that has ever worked continues to work the same way), the better.

Starting with POSIX compatible shells, supporting this is easy. We could define an interpreter clause, and do ${interpreter} -c "${contents_of_script}". This covers any sh-variant, like zsh or bash with no extra effort. We're actually lucky here; we get python for free, since it respects the same ${interpreter} -c "${contents}" format.

If we wanted to support other scripting languages like node or ruby, we can't follow that structure, because they happen to use a different letter for the same flag. I've never used powershell, but some googling leads me to believe it doesn't follow the -c pattern either. In theory we might want to support a language like Go as well, but as far as I know offers no way to run code that doesn't live in a file.

So we have a few options here:

Only officially support "shell-like" programs, meaning anything that meets the ${interpreter} -c "${contents_of_script}" pattern
Figure out a syntax to specify the full invocation, so node -e "${contents_of_script}" is possible. Probably something like interpreter: [sh, -c], although possibly with a different name.
Explicitly whitelist a series of known interpreters, which tusk knows how to call. For languages that are as simple as ${interpreter} ${arbitrary_flags} "${command_to_interpret}", we can add these liberally.

I'm sold on the idea of a global setting, whatever it looks like. For per-task or per-command settings, there's a question of overrides for multi-system compatibility. For example, what if a tusk.yml specifies /bin/bash, but I want to run /usr/local/bin/bash because the version at /bin/bash is horribly out of date? If we only have a global setting for a whole tusk.yml, we can always override with a TUSK_INTERPRETER env var or --interpreter CLI flag—that was basically the idea with supporting $SHELL. I don't know how that would look for task-local settings, though. I might be more in favor of something like this, although I'm not sure if that plays as well with include, since filepaths are relative to the root:

tasks:
  my-task:
    script: ./cmd-with-shebang.sh

I've been getting away with writing scripts that happen to be POSIX compliant and never running on a Windows machine. @smyrman @airtonix I'm curious if you had thoughts, considering this is a use case you've run into.

I'm curious if you had thoughts, considering this is a use case you've run into.

Shell scripts have a minimum portability guarantee, in that the interpreter to use is described on the first line (#! syntax). This is really a minimum guarantee, and I think tusk should be able to have a similar guarantee.
I like tusk, it's very awesome. However, as to using SHELL when set, I don't think it was a good idea. First of all, this variable in particular is generally set to describe a users default shell (at least on Mac OS). This information is of little interest; we are interested in the shell that (is most compatible with what) was intended by the tusk file author. Just changing to use /bin/sh always and ignore the SHELL variable would be huge improvement to us. However, if an interpreter can be specified explicitly, we wouldn't have to break backwards compatibility. Tusk could keep using SHELL, but as a fallback when the tusk file interpreter is not explicitly set.
Using /usr/bin/bash over /bin/bash might make sense (it's not a priority to us), but it should not be set up so that /usr/bin/bash ends up being used instead of e.g. /usr/bin/env python3. A solution to this could be that interpreters are described as bash or sh by convention (using the first executable available on the path). This could also allow tusk to hard-code some adapters for supporting shells other than those that support the -c convention in the long run, such as powershell or go.
As for setting the interpreter by task, I agree it's probably easy to work around by adding another file and call it as a script. It is not a priority feature for us to have this.
Based on our current work-around for setting SHELL, setting other environment variables at tusk initialization would be very useful, but I don't think it's the most declarative way of setting an interpreter. I.e., it's a separate issue.

As an FYI, the current work-around we have for setting the SHELL (and other environment variables) at start-up looks like this:


tasks:
  _set-env:
    usage: "Set task runner environment"
    private: true
    options:
      gobin:
        private: true
        default:
          command: echo $(pwd)/bin
      path:
        default:
          command: echo "${gobin}:$${PATH}"
    run:
      - when:
          environment:
            _TUSK_ENV_SET: ~
        set-environment:
          SHELL: /bin/sh
          GOBIN: ${gobin}
          PATH: ${path}
          _TUSK_ENV_SET: "1"
  example-task-1:
    run:
      - task: _set-env
      - ..
  example-task-2:
    run:
      - task: _set-env
      - ...
  ...

We are able to maintain this for our most used tusk file, but it's not something we want to replicate for every tusk file we write.

-- For running Go code in a task, it might be good to look into e.g gosh. However, it does not currently seam to support the -c command (or #! convention) when run as an executable.

Just changing to use /bin/sh always and ignore the SHELL variable would be huge improvement to us.

This is really helpful to know. I don't think long-term it makes sense to support SHELL; none of the options I'm considering make it seem like part of the solution. There's a good chance in the short term I remove support for SHELL even before figuring out the exact semantics for setting a custom interpreter, which I'm guessing will solve more problems than it creates.

As an additional viewpoint, here's what GNU Make has to say:

The program used as the shell is taken from the variable SHELL. If this variable is not set in your makefile, the program /bin/sh is used as the shell. ... Unlike most variables, the variable SHELL is never set from the environment. This is because the SHELL environment variable is used to specify your personal choice of shell program for interactive use. It would be very bad for personal choices like this to affect the functioning of makefiles. ... However, on MS-DOS and MS-Windows the value of SHELL in the environment is used, since on those systems most users do not set this variable, and therefore it is most likely set specifically to be used by make. On MS-DOS, if the setting of SHELL is not suitable for make, you can set the variable MAKESHELL to the shell that make should use; if set it will be used as the shell instead of the value of SHELL.

Seems well reasoned. Might make sense to leave in windows support or add a TUSK_SHELL that behaves like SHELL does now, in case anyone is intentionally using that behavior.

Either way I'm going to think on this a bit, but it's something I'd like to do something about soon.

One possible implementation (assuming interpreter is specified with a plain string), would be to have an override map of type:

var interpreters = map[string]func(ctx context.Context, args []string) error {
    "node": runNode,
    "gosh": runGosh,
}

Where any interpreter not in the map, would fallback to using ${interpreter} -c "${contents}". This could also be extended to support embedded interpreters (matched by name), or other interpreters relying on more complex customization than swapping -c for -e. Including e.g. creation of temporary files for go "interpretation".

I ended up following similar semantics to a shebang to allow the flexibility to effectively run any kind of script that can take a file as input. Under the hood, it's as simple as writing to a temp file and passing that as an arg.

I've got a working version up at #74, feel free to look it over if you had any feedback. Otherwise I'll get that into a new release probably in the next week or so.

rliebz / tusk

Allow specifying shell/interpreter #72