tomeichlersmith commented 6 months ago

Is your feature request related to a problem? Please describe. I'd like to write scripts that run programs in a denv. The easiest approach to do this is to write a normal script and then execute it with denv:

denv my-script.sh

But sometimes, this execution path is unavailable or bloated. For example, sometimes I want to run my-script.sh in a denv that resides somewhere else. The currently supported way to do this is

denv_workspace=/full/path/to/denv denv my-script.sh

It is a hassle to type out the full denv_workspace path and sometimes its not possible to do (for example in batch processing contexts). This forces me to write a second script that can wrap my-script.sh

#!/bin/sh
denv_workspace=/full/path/to/denv \
  denv $@

which unfortunately means another round of shell interpretation when expanding $@ and another file to carry around.

Describe the solution you'd like It'd be really cool if I could specify denv with a shebang so that my-script.sh is automatically run within the denv. Something like

#!/usr/bin/env denv
<rest of my-script.sh contents>

This would allow me to have a single file whose first line specifies that it is run by denv I've tried this, but it does not work out of the box. It appears to hang probably due to my misunderstanding how the script is given to denv when it is specified in the shebang.

This solution could be expanded by adding a denv option specifying what should be run within the denv.

#!/usr/bin/env denv --shebang python
print("hello world")

Or using the current "remote" running capability

#!/usr/bin/env denv_workspace=/full/path/to/denv denv
<shell script contents>

Describe alternatives you've considered A wrapper script like shown above may be able to function as a shebang, but it still would introduce the bloat of an additional script to carry around.

Additional context GNU parallel has a --shebang option: https://www.gnu.org/software/parallel/parallel_tutorial.html#shebang

Looking at the source parallel and searching for --shebang reveals that it needs to re-execute itself when acting as a shebang. It looks like we want to mimic --shebang-wrap so that the user can tell denv which program to give the script to within the container.

  # Program is called from #! line in script
  # remove --shebang-wrap if it is set
  $opt::shebang_wrap = ($ARGV[0] =~ s/^--shebang-?wrap *//);
  # remove --shebang if it is set
  $opt::shebang = ($ARGV[0] =~ s/^--shebang *//);
  # remove --hashbang if it is set
  $opt::shebang .= ($ARGV[0] =~ s/^--hashbang *//);
  if($opt::shebang) {
      my $argfile = Q(pop @ARGV);
      # exec myself to split $ARGV[0] into separate fields
      exec "$0 --skip-first-line -a $argfile @ARGV";
  }
  if($opt::shebang_wrap) {
      my @options;
      my @parser;
      if ($^O eq 'freebsd') {
    # FreeBSD's #! puts different values in @ARGV than Linux' does
    my @nooptions = @ARGV;
    get_options_from_array(\@nooptions);
    while($#ARGV > $#nooptions) {
        push @options, shift @ARGV;
    }
    while(@ARGV and $ARGV[0] ne ":::") {
        push @parser, shift @ARGV;
    }
    if(@ARGV and $ARGV[0] eq ":::") {
        shift @ARGV;
    }
      } else {
    @options = shift @ARGV;
      }
      my $script = Q(Q(shift @ARGV)); # TODO - test if script = " "
            my @args = map{ Q($_) } @ARGV;
      # exec myself to split $ARGV[0] into separate fields
      exec "$0 --_pipe-means-argfiles @options @parser $script ".
                "::: @args";
  }
    }
    if($ARGV[0] =~ / --shebang(-?wrap)? /) {
  ::warning("--shebang and --shebang-wrap must be the first ".
      "argument.\n");
    }

tomeichlersmith commented 5 months ago

The shebang #! syntax is not very flexible and breaks portability of a lot of scripts since it is handled by the kernel [^1]. Kernels of various types and versions change how long this shebang line can be (in terms of characters) sometimes going as low as 16 characters (really old). The more common limit is a few hundred characters which can quickly be reached by a longer path to the denv_workspace.^2 Some solutions exist that I will look at adapting to our case. https://github.com/spack/sbang is of particular interest since I could foresee denv acting like sbang does.

[^1]: See this unix stackexchange answer: https://unix.stackexchange.com/a/29620

tomeichlersmith commented 5 months ago

Getting the shebang working with an already-accessible workspace was rather simple. Drawing heavily from sbang as mention above https://github.com/tomeichlersmith/denv/issues/92#issuecomment-1992182401, I was able to parse the first few lines of the file and set denv_workspace to the value defined in those lines. This enables scripts like (assuming # is the comment character for the command)

#!/usr/bin/env -S denv shebang
#!denv_workspace=/full/path/to/workspace
#!/command/to/run/in/denv

code for command in denv

The command in the denv is not passed to the kernel like a shebang, so it doesn't need to be a fullpath. It is passed to sh -lc like any other instance of running denv cmd which makes it a bit more flexible than a traditional shebang.

Still To Do

[ ] decide on cosmetics of /usr/bin/env -S denv shebang or installing a separate script named denv-shebang or something (to avoid requiring the -S being passed to env.
[ ] Implement more thorough checks/warnings on processed shebang lines to avoid bugs
[ ] document this running mode in online book and man pages
[ ] Prevent old "remote" running mode where denv_workspace is defined outside the script. We don't want folks to define it as a part of their host environment because that defeats the purpose of having a denv.

No Workspace

I am interested in making the denv shebang something that could fully specify the denv such that a workspace does not need to be specified. This would make denv more usable in a cluster context where the workspace may not be shared amongst worker nodes and it could be useful for sharing work since the command would run the same as long as the users had denv installed. Supporting this running mode in a stable way would introduce some complexity.

The full config of the denv is now not specified in exactly one location. It could be loaded from .denv/config or it could be loaded from the shebang lines (perhaps with some defaults that we'd want to share with denv init as well). This would mean I'd probably want some sort of "config validation" so that we could check that both configs are valid.
Implementing this for apptainer/singularity runners would rely on either (a) symlinking a pre-built SIF everywhere, (b) re-building a SIF image in a location the running script knows about, or (c) redesigning the image-passing infrastructure to support filepaths. All of these options have their significant downsides. (a) clutters the filesystem and is kinda "dirty/hacky", (b) is a significant performance hit, and (c) is a pretty major refactor that would have implications for the docker/podman side of things as well.

These two roadblocks are causing me to pause and evaluate what I want to do. Perhaps, I will merge my workspace-required version soon (after those To Dos) and start a separate issue for workspace-less version.

tomeichlersmith / denv

shebang `#!` support #92

Still To Do

No Workspace