helix-editor / helix

A post-modern modal text editor.
https://helix-editor.com
Mozilla Public License 2.0
32.95k stars 2.43k forks source link

Prioritize shebang if no file extension is provided #5869

Open Brixy opened 1 year ago

Brixy commented 1 year ago
Helix: compiled from master branch
OS: Void

Hi!

In my bin folder there are some single letter POSIX/Bash executables: c (calendar), p (password script) and f (fuzzy finder).

Although the correct shebangs are provided (#!/usr/bin/[sh,bash]) Helix recognizes the files’ languages as c, perl and fortran.

It is easy to solve this problem, e.g. by renaming the files and using the shells abbreviations or aliases, by adding [.sh,.bash] file extensions or by adding file-types to languales.toml like this:

[[language]]
name = "bash"
file-types = ["c", "f", "t"]
shebangs = ["sh", "bash", "dash", "zsh"]
# …

I just wanted to mention this in case it is a Helix issue.

Should helix generelly prioritize the shegang if no file extension is provided?

Thanks a lot!

the-mikedavis commented 1 year ago

In general Helix checks for the shebang if it can't find a file-types entry that matches. The file-types entries though will match for filenames that match the exact file-type entry like c (C), f (Fortran) and t (Perl). We check for exact matches so that we can support full filenames like Makefile or go.mod. (Relevant code is here)

We could change the form expected for full filenames to something similar to what we use for suffix file-types (see https://docs.helix-editor.com/master/languages.html#file-type-detection-and-the-file-types-key)

[[language]]
name = "ruby"
file-types = [{ filename = "Rakefile" }, "rb"]

but this would be a breaking change for languages.toml configuration.

Brixy commented 1 year ago

Thank you very much for your detailed answer!

We could change the form expected for full filenames to something similar to what we use for suffix file-types … but this would be a breaking change for languages.toml configuration.

I cope well von one of the posted solutions; but I cannot judge whether a breaking change is justified ;-)

Please give me a hint if you want me to close this issue. Thanks a lot!

the-mikedavis commented 1 year ago

I'll leave this open and add some tags to it. Since there's a workaround (the languages.toml snippet) and it's unlikely to occur most of the time, I don't think it's worth making the breaking change to configuration for now. Eventually we want to replace the TOML configuration with something more powerful and we might be able to use that change as an opportunity to clean up some config options that need to be tweaked like this.

askreet commented 1 year ago

One option would be to change the format to:

file-types = [{ filename = "c" }, { extension = "c" }]

And then anything that isn't a struct is legacy compat.

howard36 commented 1 year ago

I think this edge case is rare enough that a breaking change isn't justified (edit: I no longer think this, see comment below) But, if you are planning to change the configuration format, I'd suggest something like:

[[language]]
name = "git-config"
file-types = { filenames = [".gitmodules", ".gitconfig"], suffixes = [".git/config", ".config/git/config"] }

This way each string is explicitly listed as a filename/extension/suffix, unlike the current format where strings like "c" match both filenames and extensions, when it should really only match extensions. It's also more concise than file-types = [ { filename = ".gitmodules" }, { filename = ".gitconfig" } ], and if the one line gets too long, you can expand the table into:

[[language]]
name = "bash"
[language.file-types]
filenames = [".bash_login", ".bash_logout", ".bash_profile", ".bashrc", ".profile", ".zshenv", ".zlogin", ".zlogout", ".zprofile", ".zshrc", "APKBUILD", "PKGBUILD", "eclass", "ebuild", "bazelrc", ".bash_aliases"]
extensions = ["sh", "bash", "zsh"]
shebangs = ["sh", "bash", "dash", "zsh"]
rcorre commented 1 year ago

I just hit this with a shell script named js (JIRA search). It had a shebang but was interpreted as a JavaScript file.

k12ish commented 1 year ago

I think syntax highlighting based on filenames is frequent enough to be a valid consideration:

Language Filenames
Shell .bash_aliases .bash_functions .bash_history .bash_logout .bash_profile .bashrc .cshrc .flaskenv .kshrc .login .profile .zlogin .zlogout .zprofile .zshenv .zshrc 9fs PKGBUILD bash_aliases bash_logout bash_profile bashrc cshrc gradlew kshrc login man profile zlogin zlogout zprofile zshenv zshrc
Ruby .irbrc .pryrc .simplecov Appraisals Berksfile Brewfile Buildfile Capfile Dangerfile Deliverfile Fastfile Gemfile Guardfile Jarfile Mavenfile Podfile Puppetfile Rakefile Snapfile Steepfile Thorfile Vagrantfile buildfile
Starlark BUCK BUILD BUILD.bazel MODULE.bazel Tiltfile WORKSPACE WORKSPACE.bazel
TOML Cargo.lock Gopkg.lock Pipfile pdm.lock poetry.lock
YAML .clang-format .clang-tidy .gemrc CITATION.cff glide.lock yarn.lock

I'm personally in favor for this breaking change, in a format similar to what @Plasma-Vortex suggested. I think Shebangs should take priority, followed by path suffixes, filenames and then extensions:

[[language]]
name = "bash"
[language.file-types]
shebangs = ["ash", "bash", ...]
suffixes = []
filenames = [".bash_aliases", ".bash_history", ...]
extensions = [".sh", ".bash", ...]

[[language]]
name = "git-config"
[language.file-types]
shebangs = []
suffixes = [".git/config", ".config/git/config"]
filenames = [".gitmodules", ".gitconfig"]
extensions = []

Implementation

I got the filenames in the first table from a YAML file maintained by Github:

Shell:
  type: programming
  color: "#89e051"
  aliases:
  - sh
  - shell-script
  - bash
  - zsh
  extensions:
  - ".sh"
  - ".bash"
  - ".bats"
  - ".cgi"
  ...
  filenames:
  - ".bash_aliases"
  - ".bash_history"
  - ".bash_logout"
  - ".bash_profile"
  - ".bashrc"
  ...
  interpreters:
  - ash
  - bash
  - dash
  ...
  tm_scope: source.shell
  ace_mode: sh
  codemirror_mode: shell
  codemirror_mime_type: text/x-sh
  language_id: 346

We can use these to help us write the default configuration file. Hopefully, the defaults will be rich enough that most users won't have to touch the configuration files at all.

I don't see this upgrade being an issue for end users, I think this is a relatively self-explanatory scheme to migrate to.

howard36 commented 1 year ago

I changed my mind after running into this issue myself (with a file named c), and I've updated my earlier comment to reflect that. I wouldn't mind introducing a breaking change for this (and if we plan to change filetype detection eventually, we might as well do it earlier, since the languages Helix supports will only grow over time)

I still think we should use the format described above, and would be willing to implement it. Using Github's list of filenames and extensions sounds like a good idea.

Edit: Unfortunately I don't have the bandwidth to work on this. But if anyone wants to submit a PR for this, go ahead!

Brixy commented 1 year ago

Just wanted to mention that this merged pull request(?) seems to have solved my initial issue.

EDIT: O sorry. I posted rubbish. I just forget about my workarounds. Still an issue. Sorry, guys!

Ordoviz commented 1 month ago

The issue with files named c, p or f was resolved with #8006 but file-types is still preferred over shebangs. IMO a shebang is a stronger hint that the file is in a particular language than the filename. Currently, systemd *.conf shell scripts are detected as HOCON despite the shebang.