Open katexochen opened 1 year ago
Interesting edge case. I did it this way for two reasons:
foo.go
file which happens to have a valid shell shebang, should we format it? My intuition is that we should not.shfmt .
(when walking directories) could easily get twice as slow due to the extra I/O.I think your case is a bit special because .finalize
is not really an extension from the looks of it. But I also can't imagine what a good heuristic would be. If the extension is longer than four characters, it's not really an extension?
If there's a foo.go file which happens to have a valid shell shebang
I don't think a valid Go file can have a valid shebang at the beginning, can it? And your shebang regex checks for the shebang at the beginning of a file, so this shouldn't be a problem.
Avoiding overhead.
Yeah, I see that point. There are situations where performance isn't that relevant, like in CI, where it would be okay to run a minute instead of a second. You could add a flag, but I understand that isn't a good solution either.
I don't think a valid Go file can have a valid shebang at the beginning, can it? And your shebang regex checks for the shebang at the beginning of a file, so this shouldn't be a problem.
Right, in practice it's not really a Go file. But I would argue the file doesn't look like a shell file, either. The Go toolchain and IDEs would pick it up as a Go file, for example. That's what I mean by being conservative - we should likely not format files which don't appear to be shell files.
After all, it's always possible to format extra files if you need to (e.g. shfmt -w **/*.finalize
), but it's harder to exclude formatting files from shfmt -w .
. You would need to use shfmt -f
, filter the list, and then feed it back.
Yeah, I see that point. There are situations where performance isn't that relevant, like in CI, where it would be okay to run a minute instead of a second. You could add a flag, but I understand that isn't a good solution either.
I'm not opposed to a new feature (like a new flag) as long as it's reasonable and commonly needed - I think this scenario is a bit too niche to warrant a flag. If I were you, assuming that these shell files with odd extensions are easy to find (either via **/*.finalize
or finalizers/*
), I would probably include them explicitly like: shfmt -w . **/*.finalize
.
Another option in terms of new flags would be to turn -f
into a flag which also accepts "exhaustive" modes, like -f=anyextension
. Though again it feels pretty niche, so I'm not a big fan unless there's good demand for it :)
FWIW, my IDE (Emacs) does recognize a file named foo.go
as a Bash file if it has a Bash shebang.
It seems, strictly speaking, that if a file has any executable bit set, then only the shebang should be checked, and if it has no executable bit set, then only the extension should be checked (since no shebang will have an effect). This matches my own usage, where I use .bash
on files that are meant to be sourced, and no extension on files that are executable (and thus have a shebang). However, shfmt -f .
already seems to check the shebang for files without an extension, so It Works For Me™️ as-is.
I actually thought about using executable bits; regardless what a file's extension is, if it has an executable bit set, we could look for a shebang inside it. Since executable files are relatively rare in software development scenarios, it could be a nice trade-off conceptually.
However, it would slow down formatting directory trees, especially when there are many files. We use filepath.WalkDir
, which only uses readdir
to read the contents of directories - that gives us the name of each file and its type (regular file, directory, symlink, etc). It does not give us the permission bits; an extra lstat
call per file is needed for that on Linux and Mac.
For more information, see https://github.com/golang/go/issues/42027.
I'm still happy to entertain a flag to tell shfmt to spend extra CPU time statting, opening, and reading many more files than it usually would. But I definitely don't want to make it the default behavior, because it will be much slower when there are many non-shell files, which is common. Walking one directory with a thousand Go files would suddenly go from one syscall (readdir) to over a thousand (readdir + 1000*lstat), for example.
Yeah, agreed on the default. Maybe a flag like --precise-identification
with the description, “Check shebang lines in executable files to identify shell scripts more precisely. This can have a performance impact on larger file trees and is not helpful if all of your shell scripts have standard extensions.”
Hi, I'm using
shfmt -f .
to list files I want to format. However, many files aren't found as they don't have a .sh/.bash extension, but are named something likemkosi.finalize
and I can't change the name as it is expected by some consumer. The files have, however, a valid shebang.The files are excluded because of the following line:
https://github.com/mvdan/sh/blob/279b806d8f2417af933a40a726ad8299395d9183/fileutil/file.go#L80
Would be great to just omit it, and do the shebang check in any case.