cmd/shfmt: change the language default from "bash" to "auto"

mvdan commented 3 years ago

That is, auto-detection of what language the input shell should be in. This is because, right now, running something like shfmt -l -w . on a large repository will simply use the default of -ln=bash for all files. One can use -ln=posix to change that, but then all files are parsed as POSIX.

It's not terribly common to mix different shell languages in a single repository, but it happens. For example, imagine a repository which has a few simple POSIX shell scripts for extreme portability, some complex Bash scripts for development, and a few Bats files for unit testing other shell scripts. Right now, this would require at least three shfmt calls, and manually hard-coding which files are in which language.

We can do better. The files themselves often tell us what language they are in via the filename extension and the shebang.

I propose the following algorithm to auto-detect the language. When a language is selected, the following steps aren't considered.

1) If the filename ends with .bash, .mksh, or .bats, we use that language. 2) If the content begins with a valid shell shebang like #!/bin/sh or #!/usr/bin/env bash, we use that language. Here, sh means POSIX shell. 3) We fall back to the current default, assuming Bash.

Step 1 does not assume .sh means POSIX, because Bash scripts with that extension are too common. If a file is meant to be POSIX shell, give it a sh shebang, which will be matched by step 2.

Step 3 allows us to not break backwards compatibility. If we don't know what language the input is in, we'll just keep the current conservative fallback that is Bash.

To my mind, the only downside to this change is that it adds a bit of complexity; we now need to explain what -ln=auto means. But I think that should be doable in a few lines in the added man page, just like we did in three list items above.

--

This is a reboot of https://github.com/mvdan/sh/issues/429, which I actually forgot about. Fun how I came up with almost exactly the same idea again. The reason I think this is worth considering again is two-fold:

The default of -ln=bash is still too stiff, in my opinion. Support for Bats is a clear example; a project having both Bash and Bats files is entirely normal.
EditorConfig does kind of mitigate this problem, but having to enter [*.bats]\nshell_variant = "bats" is silly. Plus, since EditorConfig is entirely filename-based, it does not cover Step 2 from this proposal.

CC @jansorg @kolyshkin @lassik @kaey since you all contributed to related issues. If we reach consensus on this, I'd implement this for 3.3.0, which should come out in early 2021.

jansorg commented 3 years ago

Sounds good to me! You might want to consider to make this the same or similar to ShellCheck, which has its own detection logic. That would help to be able to validate and format files without in a project using different ways to markup the file type. I haven't looked up its detection logic in the code, but it's additionally using # shellcheck shell= directives, https://github.com/koalaman/shellcheck/wiki/Directive#shell.

mvdan commented 3 years ago

It seems like their algorithm is defined in https://github.com/koalaman/shellcheck/commit/a4b9cec9f00ba31e8b48c379296d2bb1d4bb3638:

ShellCheck directive

Shebang

File extension

I don't think I want to teach shfmt about shellcheck directives. It's an explicit design decision of our parser to not treat any comments as special directives. I think shebangs are the only universal exception here, and already provide what we need.

I also slightly disagree with the order. If a foo.bash file contains #!/bin/sh, it should be treated as Bash, not as POSIX. You can argue that such a file would be contradictory and I would agree, but filename extension should trump shebangs because it's the first thing a user will see.

I agree that it would be nicer if all tools would agree on a "shell language detection" standard, but given that ShellCheck went for a custom algorithm based on directives, I'm afraid that's already an impossibility. shfmt already has a -f flag to find shell files though, so I could add a similar mechanism to have shfmt tell you what language it thinks a script is written in, and then you could apply that to the rest of your shell tools/logic.

jansorg commented 3 years ago

Thanks, that makes sense. Regarding the autodetection of foo.bash with #!/bin/sh: This controls runtime behaviour (Bash will run in posix mode when executed as sh). ShellCheck needs to know about this, but not shfmt. Therefore the detection logic above seems right to me for shfmt.

kaey commented 3 years ago

Consider following repo:

cmd/prog1
cmd/prog2
lib/funcs.sh

progs are executable scripts and contain #!/bin/ash shebang. funcs.sh is sources by both progs and has no shebang.

In this example autodetection will detect progs as posix(probably?) and funcs as bash.

mvdan commented 3 years ago

The current proposal wouldn't understand posix-y shells like ash or dash, so your prog files would default to bash. We could extend the rules in the future, but for now I want to keep them simple.

Similarly, since funcs.sh has no shebang and the .sh extension is inconclusive (see the explanation right after the steps in the original post), it would also parse as bash.

In general, if you have a library POSIX shell file that's meant to be sourced but not executed directly, I think your best bet is to add an explicit POSIX shebang at the top of it, like #!/bin/sh. The script can remain without the executable permission bit to signify that it's not meant to be a program to be run directly.

If you don't like having shebangs in library files, then your other option is EditorConfig, like:

[lib/*]
shell_variant = "posix"

# or for all .sh files, to override the "bash" fallback
[*.sh]
shell_variant = "posix"

kaey commented 3 years ago

The current proposal wouldn't understand posix-y shells like ash or dash, so your prog files would default to bash.

That's another inconsistency

In general, if you have a library POSIX shell file that's meant to be sourced but not executed directly, I think your best bet is to add an explicit POSIX shebang at the top of it, like #!/bin/sh

Nobody's going to do that.

If you don't like having shebangs in library files, then your other option is EditorConfig

Current rules are pretty simple - parse everything as bash unless EditorConfig says otherwise.

This proposal lets group of programmers targeting bash avoid writing EditorConfig, but group targeting posix shell (and other implementations) will have to deal with some inconsistencies.

In the end it's a question about whether one group is large enough that it's worth creating more rules (more code, more docs) for solving their problem, while adding some inconsistencies for the other group.

mvdan commented 3 years ago

That's another inconsistency

Like I said, we could teach the tool to understand other posix-y shells in the future.

Nobody's going to do that.

Nobody is forced to do that, either. The bash fallback is still there if you don't care. And you can always be explicit via a flag or via EditorConfig if you want.

Current rules are pretty simple - parse everything as bash unless EditorConfig says otherwise.

The rules are simple, but IMO the default is too rigid. It's fine for the parser to have extremely simple rules, where you always have to say what language you want if it's not Bash. But shfmt is meant to be more user-friendly, which is why EditorConfig was added in the first place.

group targeting posix shell (and other implementations) will have to deal with some inconsistencies.

I disagree. If anything, almost nothing changes for them. If they weren't specifying posix, they'll just keep on getting the same bash fallback. If they're specifying posix in some way, they'll get posix. And they always have the option to explicitly state that all of their code is posix.

mvdan commented 2 years ago

It's taken me a while to come back to this - I had a half-finished local branch from over a year ago, and I hadn't realised it would take me a few extra hours to iron out the last few details with tests.

Please give https://github.com/mvdan/sh/pull/796 a try and let me know how it works :) Reviews are also very much welcome.

mvdan / sh

cmd/shfmt: change the language default from "bash" to "auto" #622