Open Brixy opened 1 year ago
In general Helix checks for the shebang if it can't find a file-types
entry that matches. The file-types entries though will match for filenames that match the exact file-type entry like c
(C), f
(Fortran) and t
(Perl). We check for exact matches so that we can support full filenames like Makefile
or go.mod
. (Relevant code is here)
We could change the form expected for full filenames to something similar to what we use for suffix file-types (see https://docs.helix-editor.com/master/languages.html#file-type-detection-and-the-file-types-key)
[[language]]
name = "ruby"
file-types = [{ filename = "Rakefile" }, "rb"]
but this would be a breaking change for languages.toml configuration.
Thank you very much for your detailed answer!
We could change the form expected for full filenames to something similar to what we use for suffix file-types … but this would be a breaking change for languages.toml configuration.
I cope well von one of the posted solutions; but I cannot judge whether a breaking change is justified ;-)
Please give me a hint if you want me to close this issue. Thanks a lot!
I'll leave this open and add some tags to it. Since there's a workaround (the languages.toml snippet) and it's unlikely to occur most of the time, I don't think it's worth making the breaking change to configuration for now. Eventually we want to replace the TOML configuration with something more powerful and we might be able to use that change as an opportunity to clean up some config options that need to be tweaked like this.
One option would be to change the format to:
file-types = [{ filename = "c" }, { extension = "c" }]
And then anything that isn't a struct is legacy compat.
I think this edge case is rare enough that a breaking change isn't justified (edit: I no longer think this, see comment below) But, if you are planning to change the configuration format, I'd suggest something like:
[[language]]
name = "git-config"
file-types = { filenames = [".gitmodules", ".gitconfig"], suffixes = [".git/config", ".config/git/config"] }
This way each string is explicitly listed as a filename/extension/suffix, unlike the current format where strings like "c"
match both filenames and extensions, when it should really only match extensions. It's also more concise than
file-types = [ { filename = ".gitmodules" }, { filename = ".gitconfig" } ]
, and if the one line gets too long, you can expand the table into:
[[language]]
name = "bash"
[language.file-types]
filenames = [".bash_login", ".bash_logout", ".bash_profile", ".bashrc", ".profile", ".zshenv", ".zlogin", ".zlogout", ".zprofile", ".zshrc", "APKBUILD", "PKGBUILD", "eclass", "ebuild", "bazelrc", ".bash_aliases"]
extensions = ["sh", "bash", "zsh"]
shebangs = ["sh", "bash", "dash", "zsh"]
I just hit this with a shell script named js
(JIRA search). It had a shebang but was interpreted as a JavaScript file.
I think syntax highlighting based on filenames is frequent enough to be a valid consideration:
Language | Filenames |
---|---|
Shell | .bash_aliases .bash_functions .bash_history .bash_logout .bash_profile .bashrc .cshrc .flaskenv .kshrc .login .profile .zlogin .zlogout .zprofile .zshenv .zshrc 9fs PKGBUILD bash_aliases bash_logout bash_profile bashrc cshrc gradlew kshrc login man profile zlogin zlogout zprofile zshenv zshrc |
Ruby | .irbrc .pryrc .simplecov Appraisals Berksfile Brewfile Buildfile Capfile Dangerfile Deliverfile Fastfile Gemfile Guardfile Jarfile Mavenfile Podfile Puppetfile Rakefile Snapfile Steepfile Thorfile Vagrantfile buildfile |
Starlark | BUCK BUILD BUILD.bazel MODULE.bazel Tiltfile WORKSPACE WORKSPACE.bazel |
TOML | Cargo.lock Gopkg.lock Pipfile pdm.lock poetry.lock |
YAML | .clang-format .clang-tidy .gemrc CITATION.cff glide.lock yarn.lock |
I'm personally in favor for this breaking change, in a format similar to what @Plasma-Vortex suggested. I think Shebangs should take priority, followed by path suffixes, filenames and then extensions:
[[language]]
name = "bash"
[language.file-types]
shebangs = ["ash", "bash", ...]
suffixes = []
filenames = [".bash_aliases", ".bash_history", ...]
extensions = [".sh", ".bash", ...]
[[language]]
name = "git-config"
[language.file-types]
shebangs = []
suffixes = [".git/config", ".config/git/config"]
filenames = [".gitmodules", ".gitconfig"]
extensions = []
I got the filenames in the first table from a YAML file maintained by Github:
Shell:
type: programming
color: "#89e051"
aliases:
- sh
- shell-script
- bash
- zsh
extensions:
- ".sh"
- ".bash"
- ".bats"
- ".cgi"
...
filenames:
- ".bash_aliases"
- ".bash_history"
- ".bash_logout"
- ".bash_profile"
- ".bashrc"
...
interpreters:
- ash
- bash
- dash
...
tm_scope: source.shell
ace_mode: sh
codemirror_mode: shell
codemirror_mime_type: text/x-sh
language_id: 346
We can use these to help us write the default configuration file. Hopefully, the defaults will be rich enough that most users won't have to touch the configuration files at all.
I don't see this upgrade being an issue for end users, I think this is a relatively self-explanatory scheme to migrate to.
I changed my mind after running into this issue myself (with a file named c
), and I've updated my earlier comment to reflect that. I wouldn't mind introducing a breaking change for this (and if we plan to change filetype detection eventually, we might as well do it earlier, since the languages Helix supports will only grow over time)
I still think we should use the format described above, and would be willing to implement it. Using Github's list of filenames and extensions sounds like a good idea.
Edit: Unfortunately I don't have the bandwidth to work on this. But if anyone wants to submit a PR for this, go ahead!
Just wanted to mention that this merged pull request(?) seems to have solved my initial issue.
EDIT: O sorry. I posted rubbish. I just forget about my workarounds. Still an issue. Sorry, guys!
The issue with files named c
, p
or f
was resolved with #8006 but file-types
is still preferred over shebangs
. IMO a shebang is a stronger hint that the file is in a particular language than the filename. Currently, systemd *.conf
shell scripts are detected as HOCON despite the shebang.
I agree that shebangs should always be preferred over file-types. My issue is with *.do
files containing recipes for the redo build system. These are shell scripts by default, but redo respects shebangs in recipes, so recipes can be written in any language. On top of that, redo advises against using a /bin/sh
shebang, since that disables the default mechanism of choosing the most POSIX compatible sh implementation. That is, helix needs to be configured to treat *.do
(and *.od
!) files as bash
by default (since that's apparently what helix thinks "shell" is) but also prefer shebang detection over that default. This is currently impossible.
Hi!
In my
bin
folder there are some single letter POSIX/Bash executables:c
(calendar),p
(password script) andf
(fuzzy finder).Although the correct shebangs are provided (
#!/usr/bin/[sh,bash]
) Helix recognizes the files’ languages asc
,perl
andfortran
.It is easy to solve this problem, e.g. by renaming the files and using the shells abbreviations or aliases, by adding
[.sh,.bash]
file extensions or by addingfile-types
tolanguales.toml
like this:I just wanted to mention this in case it is a Helix issue.
Should helix generelly prioritize the shegang if no file extension is provided?
Thanks a lot!