renzmann / treesit-auto

Automatic installation, usage, and fallback for tree-sitter major modes in Emacs 29
GNU General Public License v3.0
386 stars 29 forks source link

+TITLE: treesit-auto

+AUTHOR: Robb Enzmann

+html: GNU EmacsMelpaMelpa

Automatically install and use tree-sitter major modes in Emacs 29+. If the tree-sitter version can't be used, fall back to the original major mode.

There is also a convenience function =M-x treesit-auto-install-all=, which will install all (or a selected subset) of the maintained and compatible grammars. You can add these grammars to your =auto-mode-alist= automatically by invoking the =treesit-auto-add-to-auto-mode-alist= function in your configuration.

+begin_example

M-x package-install RET treesit-auto

+end_example

If you want a local clone of the repository, rather than just a copy of the source, you might instead use =package-vc-install=

+begin_src example

M-x package-vc-install RET https://github.com/renzmann/treesit-auto.git

+end_src

Then, in your Emacs configuration file (=~/.emacs.d/init.el=),

+begin_src emacs-lisp

(use-package treesit-auto :config (global-treesit-auto-mode))

+end_src

For most users, this will be enough. There are some nifty things you might want to enable, though, which are covered in the "Configuration" section below.

+begin_src emacs-lisp

(use-package treesit-auto :custom (treesit-auto-install 'prompt) :config (treesit-auto-add-to-auto-mode-alist 'all) (global-treesit-auto-mode))

+end_src

1. If the grammar is installed, then switch to the appropriate tree-sitter mode:

In this case, assuming we open a Python buffer, and the [[https://github.com/tree-sitter/tree-sitter-python][Python tree-sitter grammar]] is installed, then Emacs will use =python-ts-mode= instead of =python-mode=.

2. The grammar is NOT installed and treesit-auto-install is non-nil:

When the grammar is not installed and ~treesit-auto-install~ is t, then upon activating any major mode that has a corresponding tree-sitter mode, the grammar will be downloaded and compiled using ~treesit-install-language-grammar~. Emacs will then activate the tree-sitter major mode for that buffer.

~prompt~ is like t, except a message will be displayed in the echo area asking for a yes/no response before attempting the installation.

As an example for both cases: if I visit a Python file and didn't already have the grammar installed, I wind up with an installed grammar and a buffer using ~python-ts-mode~.

Otherwise, when ~treesit-auto-install~ is nil, it will try to fall back to another major mode as described in the following two rules.

3. If the grammar is NOT installed, and a fallback is specified

Most languages will have a fallback mode specified, such as =python-ts-mode= falling back to =python-mode=, if the grammar is not installed. If you ever need to double-check what that fallback will be, you can double check what's in the recipe for that language like this:

+begin_example

(treesit-auto-recipe-remap (alist-get 'python treesit-auto-lang-recipe-alist)) ⇒ python-mode

+end_example

See "Configuration/Configuring behavior for a specific language" in case you would like to specify different fallback modes than the default.

4. The grammar is installed, but there is no fallback mode

You can optionally use the =treesit-auto-add-to-auto-mode-alist= function to ensure that your installed tree-sitter languages have their corresponding =...-ts-mode= added to =auto-mode-alist=, so that Emacs opens the buffer in that =...-ts-mode=, rather than the default =Fundamental= mode.

Supposing for instance we don't have =typescript-mode= installed, then even if we /do/ have =typescript-ts-mode= installed along with the typescript grammar compiled in =~/.emacs.d/tree-sitter/=, Emacs still won't use =typescript-ts-mode= unless you also added ='("\.ts\'" . typescript-ts-mode)= to =auto-mode-alist=. By calling =(treesit-auto-add-to-auto-mode-alist)= in your configuration, this modification to =auto-mode-alist= is done automatically for you.

5. All other cases...

This is the most general case, where the grammar is not installed, ~treesit-auto-install~ is nil, and no fallback mode is specified in the language recipe present on =treesit-auto-recipe-list=. In this case, we still gain the benefit of quickly installing grammars through =treesit-install-language-grammar= without having the build the recipe interactively, but =treesit-auto= will make no attempt to switch away from the tree-sitter mode.

** Choose which languages =treesit-auto= should consider

You can globally alter the behavior of =treesit-auto= to only consider a specific set of languages by setting the =treesit-auto-langs= list to a set of language symbols. By default, this list includes every possible language that =treesit-auto= supports, so you can use =M-x describe-variable RET treesit-auto-langs= to see what the options are.

One way to use this variable is to just set it manually:

+begin_src emacs-lisp

(setq treesit-auto-langs '(python rust go))

+end_src

Now, =treesit-auto= features will only ever affect Python, Rust, and Go files. Running =treesit-auto-install-all= will only install these three grammars, and no automatic prompting/installation will occur when visiting a buffer that is not one of these three, either.

Another method is to disable specific languages by just removing them from this list:

+begin_src emacs-lisp

(delete 'awk treesit-auto-langs)

+end_src

Here, =treesit-auto= behaves as it normally would for all languages /except/ AWK.

** Automatically install grammars if they are missing The =treesit-auto-install= variable controls whether a grammar should be installed automatically when activating a major mode compatible with tree-sitter.

  1. =nil=, the default, means =treesit-auto= won't try to install anything, and will rely on the fallback logic outlined above
  2. =t= means =treesit-auto= should always try to clone and install a grammar when missing
  3. ~prompt~ will cause a yes/no prompt to appear in the minibuffer before attempting installation

+begin_src emacs-lisp

(setq treesit-auto-install 'prompt)

+end_src

Then, supposing I don't have =libtree-sitter-python.so= (or its mac/Windows equivalent) under =~/.emacs.d/tree-sitter= (or anywhere else in =treesit-extra-load-path=), visiting a Python file or calling =M-x python-ts-mode= will generate this prompt:

+begin_example

Tree-sitter grammar for python is missing. Would you like to install it from https://github.com/tree-sitter/tree-sitter-python? (y or n)

+end_example

Responding with "yes" will use =treesit-install-language-grammar= to go fetch and compile the missing grammar.

The other function that respects this variable is =treesit-auto-install-all=. When =treesit-auto-install= is t, using =M-x treesit-auto-install-all= will skip all prompts. Otherwise, it will ask before attempting the installation.

** Configuring behavior for a specific language The variable =treesit-auto-recipe-list= keeps track of all the language "recipes." These control how =treesit-auto= decides which modes to upgrade/downgrade to/from, where the source code of the language grammar is hosted, and which C/C++ compiler to use. Each recipe can take these arguments:

+begin_example

:lang :ts-mode :remap :url :revision :requires :source-dir :cc :c++

+end_example

To create a recipe, use =make-treesit-auto-recipe=:

+begin_src elisp

(setq my-js-tsauto-config (make-treesit-auto-recipe :lang 'javascript :ts-mode 'js-ts-mode :remap '(js2-mode js-mode javascript-mode) :url "https://github.com/tree-sitter/tree-sitter-javascript" :revision "master" :source-dir "src" :ext "\.js\'"))

(add-to-list 'treesit-auto-recipe-list my-js-tsauto-config)

+end_src

Here, we've specified that the tree-sitter compiler will be creating a file named =libtree-sitter-javascript.so= (or =.dylib= or =.dll=), based on the =:lang= field. The corresponding tree-sitter mode in Emacs is called =js-ts-mode=, and all of =js2-mode=, =js-mode=, and =javascript-mode= should attempt switching to the =js-ts-mode=, if possible.

Moreover, since =js-2-mode= is first under the =:remap= section, that is the "primary fallback." Meaning that if the tree-sitter grammar is not available, it will be the first mode tried. If that doesn't work, it will try =js-mode=, and =javascript-mode=, in that order, until one /does/ work. If only one fallback needs to be specified, a single quoted symbol is also acceptable. For instance, =python-ts-mode= just uses =:remap 'python-mode= in this argument position.

If a grammar mandates any other grammars be installed as a dependency, the =:requires= keyword can specify a language symbol or list of symbols that should be installed. One example of this is found in the TypeScript recipe, which specifies =:requires 'tsx=, since activating =typescript-ts-mode= on some Emacs builds will attempt to load the TSX grammar.

The =:url=, =:revision=, =:source-dir=, =:cc=, and =:c++= arguments are all documented under =treesit-language-source-alist=, which is part of base Emacs, not this package.

** Keep track of your hooks This package does not modify any of your major mode hooks. That is, if you have functions in =python-mode-hook=, but not in =python-ts-mode-hook=, then your hook from =python-mode= will not be applied, assuming =python-ts-mode= is what gets loaded. For major modes in which this is a concern, the current recommendation is to address this as part of your configuration.

+begin_src emacs-lisp

(setq rust-ts-mode-hook rust-mode-hook)

+end_src

Some modes have a shared base, such as =python-ts-mode= and =python-mode= both deriving from =python-base-mode=. For these languages, you can opt to hook into =python-base-mode-hook= instead of explicitly setting the tree-sitter mode's hook.

** Automatically register extensions for =auto-mode-alist= You can register tree-sitter modes to =auto-mode-alist= by calling =treesit-auto-add-to-auto-mode-alist=. Depending on the optional argument =langs=, this function can behave in three different ways:

  1. =nil=, the default - Only add tree-sitter modes to =auto-mode-alist= if the tree-sitter mode is available to Emacs /and/ the grammar is installed.
  2. ='all= - For every tree-sitter mode available to Emacs and in =treesit-auto-langs=, add it to =auto-mode-alist= regardless of whether the grammar is installed. This has the beneficial side effect of installing grammars for you when opening files that have a tree-sitter mode that comes with Emacs, but have no fallback mode. Examples of this are =rust-ts-mode=, =go-ts-mode=, and a few others in Emacs 29+.
  3. A list of language symbols - behaves like ='all=, but only for the listed languages

For instance, you might run this function as:

+begin_src emacs-lisp

(treesit-auto-add-to-auto-mode-alist '(rust go toml))

+end_src

This registers your tree-sitter modes according to the common file extension for Rust, Go, and TOML, but no other modes. Most users will probably want to use =(treesit-auto-add-to-auto-mode-alist 'all)= for the easiest general behavior; always prompting/installing grammars when we can.

We'll take two examples: TypeScript and Python.

Emacs 29 ships with =typescript-ts-mode= and =tsx-ts-mode=, but no equivalent =typescript-mode= or =tsx-mode=. =python-ts-mode= and =python-mode=, on the other hand, are both available out of the box. If you wanted these grammars to automatically install on launch, and then use the tree-sitter modes instead of the base modes for every file, you'd need the following code in your init file:

+begin_src emacs-lisp

(setq treesit-language-source-alist '((typescript . ("https://github.com/tree-sitter/tree-sitter-typescript" "master" "typescript/src")) (tsx . ("https://github.com/tree-sitter/tree-sitter-typescript" "master" "tsx/src")) (python . ("https://github.com/tree-sitter/tree-sitter-python"))))

(dolist (source treesit-language-source-alist) (unless (treesit-ready-p (car source)) (treesit-install-language-grammar (car source))))

(add-to-list 'auto-mode-alist '("\.ts\'" . typescript-ts-mode)) (add-to-list 'auto-mode-alist '("\.tsx\'" . tsx-ts-mode)) (add-to-list 'major-mode-remap-alist '(python-mode . python-ts-mode))

+end_src

There are plenty of reasons why some users would prefer to take this type of hand-tuned approach to their tree-sitter modes. For most Emacs people, though, you can see the natural progression of where a config like the above would go:

If you were to follow this chain yourself, you'd probably wind up with something in your =init.el= that looks very similar to the code in this package.

All in all, this is a small package. Roughly half of it is just maintaining a library of information for =treesit-language-source-alist=, and the other half works through the logic of handling edge cases related to the remaining bullets above.

Another side behavior you may encounter is when opening an Org document with shell scripts inside and =treesit-auto-install= is non-nil, then =treesit-auto= will prompt to install the Bash grammar, but won't display the prompt until you interact with Emacs in some way, such as using =next-line= (=C-n= by default) or hovering over Emacs with your mouse.

Issues are tracked on [[https://github.com/renzmann/treesit-auto/issues][GitHub]], which is also where patches and pull requests should be submitted.

If you would like to submit a new language recipe to be distributed as part of this package, see [[CONTRIBUTING.org][CONTRIBUTING.org]] for a quick guide on how to write and submit the new recipe.