mvdan / sh

A shell parser, formatter, and interpreter with bash support; includes shfmt
https://pkg.go.dev/mvdan.cc/sh/v3
BSD 3-Clause "New" or "Revised" License
6.99k stars 333 forks source link

cmd/shfmt: Files with non-shell extension excluded #944

Open katexochen opened 1 year ago

katexochen commented 1 year ago

Hi, I'm using shfmt -f . to list files I want to format. However, many files aren't found as they don't have a .sh/.bash extension, but are named something like mkosi.finalize and I can't change the name as it is expected by some consumer. The files have, however, a valid shebang.

The files are excluded because of the following line:

https://github.com/mvdan/sh/blob/279b806d8f2417af933a40a726ad8299395d9183/fileutil/file.go#L80

Would be great to just omit it, and do the shebang check in any case.

mvdan commented 1 year ago

Interesting edge case. I did it this way for two reasons:

I think your case is a bit special because .finalize is not really an extension from the looks of it. But I also can't imagine what a good heuristic would be. If the extension is longer than four characters, it's not really an extension?

katexochen commented 1 year ago

If there's a foo.go file which happens to have a valid shell shebang

I don't think a valid Go file can have a valid shebang at the beginning, can it? And your shebang regex checks for the shebang at the beginning of a file, so this shouldn't be a problem.

Avoiding overhead.

Yeah, I see that point. There are situations where performance isn't that relevant, like in CI, where it would be okay to run a minute instead of a second. You could add a flag, but I understand that isn't a good solution either.

mvdan commented 1 year ago

I don't think a valid Go file can have a valid shebang at the beginning, can it? And your shebang regex checks for the shebang at the beginning of a file, so this shouldn't be a problem.

Right, in practice it's not really a Go file. But I would argue the file doesn't look like a shell file, either. The Go toolchain and IDEs would pick it up as a Go file, for example. That's what I mean by being conservative - we should likely not format files which don't appear to be shell files.

After all, it's always possible to format extra files if you need to (e.g. shfmt -w **/*.finalize), but it's harder to exclude formatting files from shfmt -w .. You would need to use shfmt -f, filter the list, and then feed it back.

Yeah, I see that point. There are situations where performance isn't that relevant, like in CI, where it would be okay to run a minute instead of a second. You could add a flag, but I understand that isn't a good solution either.

I'm not opposed to a new feature (like a new flag) as long as it's reasonable and commonly needed - I think this scenario is a bit too niche to warrant a flag. If I were you, assuming that these shell files with odd extensions are easy to find (either via **/*.finalize or finalizers/*), I would probably include them explicitly like: shfmt -w . **/*.finalize.

mvdan commented 1 year ago

Another option in terms of new flags would be to turn -f into a flag which also accepts "exhaustive" modes, like -f=anyextension. Though again it feels pretty niche, so I'm not a big fan unless there's good demand for it :)

sellout commented 1 year ago

FWIW, my IDE (Emacs) does recognize a file named foo.go as a Bash file if it has a Bash shebang.

It seems, strictly speaking, that if a file has any executable bit set, then only the shebang should be checked, and if it has no executable bit set, then only the extension should be checked (since no shebang will have an effect). This matches my own usage, where I use .bash on files that are meant to be sourced, and no extension on files that are executable (and thus have a shebang). However, shfmt -f . already seems to check the shebang for files without an extension, so It Works For Me™️ as-is.

mvdan commented 1 year ago

I actually thought about using executable bits; regardless what a file's extension is, if it has an executable bit set, we could look for a shebang inside it. Since executable files are relatively rare in software development scenarios, it could be a nice trade-off conceptually.

However, it would slow down formatting directory trees, especially when there are many files. We use filepath.WalkDir, which only uses readdir to read the contents of directories - that gives us the name of each file and its type (regular file, directory, symlink, etc). It does not give us the permission bits; an extra lstat call per file is needed for that on Linux and Mac.

For more information, see https://github.com/golang/go/issues/42027.

I'm still happy to entertain a flag to tell shfmt to spend extra CPU time statting, opening, and reading many more files than it usually would. But I definitely don't want to make it the default behavior, because it will be much slower when there are many non-shell files, which is common. Walking one directory with a thousand Go files would suddenly go from one syscall (readdir) to over a thousand (readdir + 1000*lstat), for example.

sellout commented 1 year ago

Yeah, agreed on the default. Maybe a flag like --precise-identification with the description, “Check shebang lines in executable files to identify shell scripts more precisely. This can have a performance impact on larger file trees and is not helpful if all of your shell scripts have standard extensions.”