ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
34.32k stars 2.51k forks source link

Support shebang #2703

Closed marler8997 closed 5 years ago

marler8997 commented 5 years ago

This is a feature taken directly from D.

D supports the "shebang line" at the beginning of D source files. This allows posix platforms to execute .zig files directly. For example,

hello.zig:

#!/usr/bin/env zigrun
const std = @import("std");

pub fn main() !void {
    // If this program is run without stdout attached, exit with an error.
    const stdout_file = try std.io.getStdOut();
    // If this program encounters pipe failure when printing to stdout, exit
    // with an error.
    try stdout_file.write("Hello, world!\n");
}
$ chmod +x hello.zig
$ ./hello.zig
Hello, world!

Another use case would be to allow build.zig files to be executed directlry, i.e.

build.zig:

#!/usr/bin/env zigbuild
// ...
$ chmod +x build.zig
$ ./build.zig

To support this, the zig parser would need to be able to detect and ignore the first line of a .zig file if it starts with #!.

Also, since the shebaing line only seems to support one parameter, zig run and zig build would need to have be supported with a single filename, i.e.

zigrun:

#!/usr/bin/env bash
zig run $@

zigbuild:

#!/usr/bin/env bash
zig build $@

Or another solution would be to have zig check the first argument to see if it ends with zigrun or zigbuild, and then make symbolic links to zig, i.e.

ln -s zig $(dirname $(which zig))/zigrun
ln -s zig $(dirname $(which zig))/zigbuild
daurnimator commented 5 years ago

Shebang support was recently removed from zig #2165 (though I wanted to keep it)

andrewrk commented 5 years ago

Yep, have a look at #2165 and all the information & arguments there. If you have something new to bring to the table, leave it as a comment here, and I'll re-open this proposal.

marler8997 commented 5 years ago

Lol, thanks for the reference. I didn't see any comments stating the main use case for this. It's true that requiring a user to run zig run script.zig is not any more difficult than chmod +x script.zig; ./script.zig. However that's not the main reason for supporting shebang. The main reason is to be able to run a script without having to know what language/interpreter it requires. This would be if you wanted to replace your bash/python/perl scripts with zig without having to change all the places where those scripts are invoked:

dosomething

#!/usr/bin/env bash
# some bash code

Now say you wanted to replace this with zig WITHOUT having to depend on BASH:

dosomething

#!/usr/bin/env zigrun
fn main() !void {
    // some zig code
}
andrewrk commented 5 years ago

I believe I addressed this in https://github.com/ziglang/zig/issues/2165#issuecomment-478813464 and I don't feel that my counterpoints there are addressed here.

marler8997 commented 5 years ago

I'll preface this by saying I don't have a strong opinion on this, but will share my thoughts and experience so that all the information will have been presented.

Relying on system zig goes against one of the fundamental things Zig is trying to accomplish: reliable, consistent build environments, which are to some degree insulated from the differences in various systems.

This point seems orthogonal as to whether or not shebang should be supported as you wouldn't necessarily need to use one system-wide version of zig. The "interpreter" (the program specified in the shebang line) could for example select or find a particular version of the zig compiler specifically for the zig file being executed.

Granted the examples given don't do this (i.e. #!/usr/bin/env zigrun), however, you could imagine a program that could, i.e. #!/usr/bin/env zigrunner . Where zigrunner identifies the script and tries to find a matching zig compiler to execute it. Or the zig compiler itself could perform this check, possibly by checking it's version with something inside the zig "script", which brings us to your second point:

We want source code to be able to specify the version of zig it depends on, potentially allowing the zig language to make backwards incompatible changes without actually breaking code.

I think this is a great idea. Something I've tried to discuss with D community: https://forum.dlang.org/post/jeycdimrrdteivmaqebf@forum.dlang.org

This one also seems orthogonal as to whether shebang should be supported, in fact it makes it easier to support shebang as it provides a way for the script itself to declare compiler/language versions dependencies rather than "hoping" that the user has the right versions installed.

We want source code to be able to depend on packages.

This one also seems orthogonal to me. Maybe I'm missing something though.

The use cases of shebang lines that I can think of are: [...]

There's really only one use case I can think of to support shebang. That is to be able to write scripts in zig that are language/interpreter agnostic from the caller's perspective. It encapsulates the scripts "interpreter" from the caller, because in most cases the caller doesn't care how the program runs or what language it's written in, only that it works. i.e.

dosomething:

#!/usr/bin/env zig_runner_program

// zig code

The script is named dosomething and is callable from any other component in the system without having to know whether it is running zig/bash/perl/whatever. Google's BASH guidelines mention this concept as well:

https://google.github.io/styleguide/shell.xml

File Extensions link ▽ Executables should have no extension (strongly preferred) or a .sh extension. Libraries must have a .sh extension and should not be executable. It is not necessary to know what language a program is written in when executing it and shell doesn't require an extension so we prefer not to use one for executables.

Now to be fair, one can use multiple files and leverage other tools like bash to achieve the same "encapsulation", i.e.

dosomething.zig:

// zig code

dosomething:

#!/usr/bin/env bash
exec /usr/bin/env zig_runner_program dosomething.zig $@

The downside being that now all your zig scripts require wrapper bash scripts to keep from exposing the fact that they are zig scripts to the caller.

I think this one use case is all that needs to be considered when deciding whether the language should support shebang lines.

andrewrk commented 5 years ago

There's really only one use case I can think of to support shebang. That is to be able to write scripts in zig that are language/interpreter agnostic from the caller's perspective.

More ideal would be to build the zig code into an ELF file and then use that as the script. I fail to see how the shebang setup process that you've described here would be less cumbersome than an actual build process. Trying to build zig code with a shebang line inherently introduces problematic system dependencies. I think this proposal encourages people to go down ultimately dead end paths, when really we would rather have people using the build system and package manager. I think this bash workaround you have described here is exactly the right amount of friction towards using Zig to write scripts in this way.

marler8997 commented 5 years ago

I fail to see how the shebang setup process that you've described here would be less cumbersome than an actual build process.

I wouldn't say that supporting shebang is "less" or "more" cumbersome than a build process because it's not a replacement for a build process. It would depend on a build process (i.e. the zig run build process is one example). The shebang line allows the file itself to declare which intepreter it requires rather than requiring the caller to select the right one. This actually provides the opportunity to make the script more "resilient" to system changes as the interpreter used to run it could perform checks before running the script.

For example, imagine you write a script named "foo" in zig. The user doesn't know its written in zig, they just expect to be able to execute it:

./foo

If you require the user to select the right interpreter, rather than having the file itself select the interpreter, than you open yourself up to more potential errors:

bash foo                  # fails, this isn't a bash script
zig run foo               # fails, wrong version of zig
~/zig2.3/zig run foo      # fails wrong version of zig

Keep in mind, the shebaing line in this foo script wouldn't be #!/usr/bin/env zig, in this example it's using a theoretical tool that can run zig scripts and find/select the right compiler and resolve dependencies (i.e. #!/usr/bin/env zig_script_runner).

Trying to build zig code with a shebang line inherently introduces problematic system dependencies.

I think this is a common misconception. Shebang lines really having nothing to do with system dependencies. All they do is allow a file to declare which program should be used to interpret it. This interpreter could be a system-wide dependency (which it usually is) or it could be "non-system" local tool.

That being said, it does require that the interpreter be an absolute path, which is why /usr/bin/env exists. But there's nothing stopping you from putting any interpreter in your shebang line.

One linux distribution nix doesn't have bash and other tools installed in standard locations like /bin/bash. With nix, every package goes into its own directory, so when they build packages, they post-process all the shebang lines to point to the correct package, i.e.

#!/bin/bash
...

becomes:

#!/nix/store/cinw572b38aln37glr0zb8lxwrgaffl4-bash-4.4-p23/bin:
...

I think this proposal encourages people to go down ultimately dead end paths, when really we would rather have people using the build system and package manager.

Using the shebang line isn't a replacement for the build system/package manager, it would depend on them. Zig scripts would still use the same build system/packages whether they are invoked directly using the shebang line or indirectly using explicit interpreters. The only difference is that the shebang line allows the script to be called without the caller having to know which specific interpreter it requires.

Normally, such as in the zig repo, you know which scripts are in zig and which aren't, so this point is irrelevant. This only becomes useful when you start replacing scripts that live outside the zig ecosystem with zig scripts. Take the ldd program. Normally it's a bash script, but maybe someone wants to write it in zig:

ldd:

#!/usr/bin/env zig_script_runner
pub fn main() !void {
    //
}

Right now zig is pretty new so we don't really know if it will be used for things like this. If not then I don't see any reason to support shebang lines.

marler8997 commented 5 years ago

Anyway, like I said I don't have a strong opinion on this. I think I've explained when this is useful and it's not clear to me how to quantify the value that it adds at this point therefore it's not clear to me whether it would be a net gain/loss.

In any case, I don't think it's priority at least right now so I'm going to close having shared my thoughts. If at some point we see a need for this feature then we can re-open this Proposal to re-visit.

andrewrk commented 5 years ago

For example, imagine you write a script named "foo" in zig.

I just want to note, here are all the ways I would foresee "foo" being distributed:

I don't see why there would be another bullet point here for shebang source file distribution. I see that as strictly worse than any of these other options. If someone is writing scripts for personal use; that's fine, and in that case using zig run or your bash workaround example above are reasonable to do. That's the "quick 'n dirty user scripts" bullet point from https://github.com/ziglang/zig/issues/2165#issuecomment-478813464

marler8997 commented 5 years ago

Source file shebang distribution is an interesting use case. I haven't thought too much about it, but it is one feature that would be enabled with shebang support. I don't think I would agree that it's "strictly worse than" the other mechanisms you listed as it would make it easy for the end user to debug/understand/modify the program since it doesn't require it to be rebuilt by the user.

When you require that the user build a zig program before it can be executed, the user now has to track two copies of the same program, one that the they can understand and one that the computer can execute. With shebang support, you just have 1 representation which can be read/modified and executed by the user. Whenever you can reduce duplication (i.e. 2 representations of the same program) you remove a set of problems (i.e. caching, keeping them in sync, know whether one has been changed, knowing how to generate one from the other, etc).

In any case, the use cases for shebang goes beyond how to distribute zig programs. It's more general, in that it just provides a way for zig source code to select an interpreter to process it. This does enable a new way to distribute zig programs but it's a pretty specific use case for the general mechanism.

Mouvedia commented 3 years ago

a theoretical tool that can run zig scripts and find/select the right compiler and resolve dependencies (i.e. #!/usr/bin/env zig_script_runner)

Now that #7673 has been merged, this becomes relevant.

marler8997 commented 3 years ago

Another thought I've had on this is that distributing tools written in Zig via source means that it will always be compiled on the native target allowing for maximum optimization, leveraging shared libraries on the platform, and avoiding other problems with cross-compiled tools.

nektro commented 3 years ago
#!/usr/bin/env -S zig run

const std = @import("std");

pub fn main() anyerror!void {
    std.log.info("All your codebase are belong to us.", .{});
}

the above snippet allows zig run to properly work inside a shebang with no need for external scripts or aliases. being able to write scripts in Zig, with all the safety and security that it brings, would allow it to break into the sysadmin space (over bash/python). however, as the support as been removed, the script currently fails to compile with the following warning at least as of 0.8.0-dev.1385+3dd8396a5:

❯ ./test.zig
./test.zig:1:1: error: invalid token: '#'
#!/usr/bin/env -S zig run
^
mkeedlinger commented 3 years ago

Using Zig over something like bash or python (for the reason @nektro described) is actually what brought me to this issue. It's not terribly uncommon for me to write something quick and dirty and not bother to commit it to SCM, and I've been on teams where a small (but vital) tool was done just as haphazardly.

I think the direct benefit of source distribution for some cases is that you can't lose the source :sweat_smile: This can be a total lifesaver when something needs changing later on.

I did read #2165 and the comment mentioning "quick and dirty scripts" as possibly the strongest argument for supporting shebang and I agree. I can understand that maybe this isn't the niche Zig wants to fill, but I do think it's a niche people are interested in. Maybe I'm too young but bash and friends are torture to use, and python is too untyped (I know they have type hints now, but thas seems like an afterthought). I also like that if I am trying to run my script on a remote system that doesn't have Zig, I could compile it and send it over, something that can't be said for scripting-only languages.

For anyone who is interested in this niche, I've also been looking at vlang which does support shebangs and has really cool features for scripting. The thing keeping me away is how alpha-level V is and how the author seems to have bitten off a lot (maybe more than a person can chew?).

Just my 2¢.

danielchasehooper commented 1 year ago

You can get shebang-type behavior with this as first line in a zig file. It even passes arguments along so they're accessible with std.process.argsAlloc():

//usr/bin/env zig run "$0" -- "$@"; exit

pub fn main() void {
    @import("std").debug.print("chmod +x me.zig, then run me with ./me.zig \n",.{});
}

If you want repeated runs of the (unchanged) file to be faster, you can enable cache: //usr/bin/env zig run --enable-cache "$0" -- "$@" | tail -n +2; exit; the tail command removes a line that zig outputs to show the path to the global cache

yohannd1 commented 1 year ago

Quick question - why does this work? I noticed the extension has to be necessarily .zig for this to work (It worked with a file called a.zig, but not with a).

daurnimator commented 1 year ago

Quick question - why does this work?

  1. Because // is a comment in zig so it ignores the first line
  2. Because lacking a shebang (#!) or an ELF header (or other binfmt_misc thing), most shells will try and run any executable file as a shell script.
  3. When run as a shell script, //usr/bin/env run the executable at that location
    • on most systems (not all.... paths that start with // are special according to posix... but three / are not. that's a possible improvement to make) the extra leading / compared to //usr/bin/env will be ignored.
  4. zig run does different things based on file extension. that's why it needs to have .zig
  5. -- tells zig run that the following arguments should be passed to the program after it's compiled
  6. "$@" means all arguments to the current script
  7. the ; exit is to stop your shell from trying to run the rest of the file as a shell script.