sogaiu / tree-sitter-clojure

Clojure(Script) grammar for tree-sitter
Creative Commons Zero v1.0 Universal
149 stars 18 forks source link

Add `tags.scm` file #59

Closed hewcaw closed 5 months ago

hewcaw commented 5 months ago

I'm trying to add Clojure/CLJS support for https://github.com/paul-gauthier/aider/issues/373 and it requires this file. I don't know anything about Tree-sitter but do really want to help, so if you have some free time to bare it would be really helpful if you could provide some about instruction / direction, thank you!

sogaiu commented 5 months ago

I looked into tags.scm [1] a bit before, and the conclusion I reached at that point was that due to the way tree-sitter-clojure is constructed (partly because of present tree-sitter limitations / functionality / scope), it didn't seem likely that an all that meaningful tags.scm file can be created...but perhaps it depends on the details of how it's to be applied.

To explain a bit, taking a look at one of the pages referenced in the issue you linked to, one can see:

Class definitions   @definition.class
Function definitions    @definition.function
Interface definitions   @definition.interface
Method definitions  @definition.method
Module definitions  @definition.module
Function/method calls   @reference.call
Class reference     @reference.class
Interface implementation    @reference.implementation

Unlike most other tree-sitter grammars, tree-sitter-clojure (and a number of other comparable things for lisp-likes) doesn't yield parse trees with "higher level" constructs (e.g. function definitions). See here for background and rationale.

The only types of things that end up in a parse tree are things like lists, vectors, and other "low level" constructs. Even function definitions are not represented in the tree (see link above for a discussion of that particular point, but it's an example of a more general issue).

I'm not sure how aider uses tags.scm files. Perhaps if that was clearer, one could make a better assessment. This directory at the aider repository contains *-tags.scm files for a variety of languages. If you take at the one for elisp:

;; defun/defsubst
(function_definition name: (symbol) @name.definition.function) @definition.function

;; Treat macros as function definitions for the sake of TAGS.
(macro_definition name: (symbol) @name.definition.function) @definition.function

;; Match function calls
(list (symbol) @name.reference.function) @reference.function

I think this makes it clear that, for at least tree-sitter-elisp, there is a reliance on the grammar providing some ability to recognize definitions (e.g. IIUC function_definition and macro_defintion are names of nodes that the corresponding grammar recognizes) and calls.

The third item (the one that is like (list ...)) hints that it might be of some use to "fake" recognition of function calls by picking out (something like) "lists that start with symbols". That will match other things though, including macro calls as well as well a special form is used. I don't know if that makes a difference for accuracy in the context of aider.

I think it could be useful to try to ascertain how well aider works with Elisp (may be aider folks can be asked or you can test directly?). If it doesn't work that well, I would suspect that it's not going to work better for Clojure (since tree-sitter-clojure lacks things like function_definition, it seems likely to me that results are unlikely to ever be better than those for elisp -- just a guess though), though I suppose one may not know until something is tried.


If you still want to try get aider working with Clojure, the following are some things that might help (note: I suggest browsing the following items and thinking over an initial plan, mix and match, change ordering, etc. (you can revisit / revise later)):

  1. Scan this part of the official tree-sitter docs. It's a lot to take in (and I wouldn't try to really understand it in detail on a single reading), but I think it's good to know what kinds of docs exist. The kind of information you'll likely want to be able to make sense of has to do with what's in a grammar.js file and what the bits mean. It also has some background which will likely turn out to be useful when working with tree-sitter.
  2. Clone Wilfred's tree-sitter-elisp repository. According to this aider document, that's the elisp grammar being used (there is actually another one in existence that is a fork with some changes). Then start looking over the contained grammar.js. I think you'll find list, function_definition and macro_defintion mentioned. Try to understand the associated definitions. These might not directly help much for Clojure's case, but it might be a good focused initial task for starting to become familiar with typical grammar.js files -- and Emacs Lisp is much closer to Clojure than many other programming languages.
  3. Try to understand how the content of tree-sitter-elisp-tags.scm (the whole content is considered a single query in tree-sitter lingo -- a bit confusing, but that's how it is). For this, I suggest becoming familiar with at least at this section of the official docs. It may be that learning interactively with the tree-sitter playground could be helpful, but that version doesn't have any lisp-likes. If you install the tree-sitter cli locally and have the appropriate setup, you can run a local version of the playground for specific grammars, but this may be a fair bit of work depending on your background, motivation, energy, etc. This is because to get that plaground running you may need to build an appropriate .wasm file and that has some finicky bits. This page might have some useful information regarding this. Note that the tree-sitter cli program has a subcommand named query which can be used to test out queries -- the playground can be nice (especially initially), but it's not strictly necessary.
  4. Figure out some way to test / experiment with tree-sitter-elisp-tags.scm. I don't know anything about aider, but if you can run it locally, that would likely be helpful because you can iterate over different versions of your file. Running locally is not necessarily essential, what is important is to be able to try different versions of the file in a manner that doesn't take too much time to see the effects of changes you make. There might be suggestions about how to do something appropriate at the aider repository. This document looks like it has a couple of relevant looking items.
  5. Once you are comfortable with tweaking tree-sitter-elisp-tags.scm, try to sketch out what might be necessary for Clojure's case and test with that. At a minimum, creating a tree-sitter-clojure-tags.scm file that contains something similar to the (list ...) bit (this will be for idenitfying "calls" / "references") discussed earlier seems like it could be a good starting point, though I suspect it will be necessary to be able to express an expression that helps to identify definitions as well. The node names are different in tree-sitter-clojure (from those in tree-sitter-elisp), so you'll likely want to check the grammar.js file in this repository (e.g. it's not list here, but rather list_lit). However, there might be other necessary bits. For example, it looks like aider depends on py-tree-sitter-languages, which doesn't appear to list Clojure. It may be that getting support for tree-sitter-clojure added there is necessary (at least eventually if aider uses py-tree-sitter-languages). In the near term, it may be enough to create a fork that includes Clojure support and get whatever version of aider you are testing with to use that instead. There might be other things that need to be changed, but may be the aider folks can help identify those things.
  6. Consider joining the tree-sitter discord (see the top of this page) if you haven't already. There are people there who might be able to help you out from a general tree-sitter perspective. I don't know if there are any support channels for aider (apart from its issues), but it seems worth checking.

May be there is something of use in the above (^^;

I'm ok to discuss this further if you wish -- not sure how much and for how long, but we'll see as it goes.


[1] It's tags.scm btw, not tags.csm :)

hewcaw commented 5 months ago

Holy Jesus Moly Christ, I didn't expect all of this, thought that it is something that you guys left out because there's no use of it. Kind of implied it is somewhat easier to get into so I tried to be "falsely-humble" with the intention of making you guys implement it, haha. But with this god level type of reply I guess I have no choice but to stay.

For the context, I was trying to dig into https://github.com/logseq/logseq/ and thought tools like aider and continue.dev would be really helpful for beginner to learn a new "strange" language and a complex codebase. I started to understand a bit and enjoy the language somewhat — and now wanting to speed up the process more — just to fingered out that it is not get supported by those tools is such a bummer.

Anyway, with that being said, I guess I need some "departure time" to grasp all of this 🤣. Still unsure but I'll try to put in some "unneeded effort" and report back. Appreciate this a lot brother, very super-duper helpful — dude you probably spend a good portion of time writing all of that — it's just... wow... I'm still getting shocked. Man, thank you very much. 😘

[1] Dang it! Sorry, knew something is wrong but I was too impulsive, LOL.

sogaiu commented 5 months ago

Thanks for the additional context.

If you decide to continue further, please feel free to reach out at that point. It's possible I might still be inclined to discuss more -- who knows what the future holds, right?

As for adding a tags.scm file, I think from the investigation above it seems clear it would be specific to aider so I don't think we'd be adding it here (a similar thing holds for highlighting files -- e.g. neovim has its own, helix editor has theirs, etc., so this is not unusual in the tree-sitter world).

Perhaps you're ok with closing this particular issue for the moment then?

Take care :)

hewcaw commented 5 months ago

That makes totally sense, once again — many thanks. ❤️