Open TristanCacqueray opened 2 years ago
I hadn't heard of tree-sitter before. Looking into it a bit, it looks cool, but I don't understand what the benefit would be as compared to just implementing more of the LSP protocol, to give editors semantic information about tokens which they can use for syntax highlighting. Indeed, the tree-sitter plugin for VSCode appears to be deprecated for exactly this reason.
To be more blunt, maintaining a separate parser in whatever custom grammar description language tree-sitter uses, and having to keep it up-to-date every time we change the swarm-lang syntax, sounds absolutely awful. I would much, much, much rather get nice syntax highlighting via LSP, which means we can just piggyback on the existing Haskell parser for swarm-lang.
Good points, though according to the masteringemacs article linked above, it seems like LSP is a poor fit for syntax highlighting. I guess it's worth a try.
I agree it's awful to duplicate the work, but we are kind of already doing this for emacs with regex, and vscode with textmate. Perhaps using tree-sitter as a drop-in replacement is not as bad as it sounds, and we could get vim support for free.
For vscode, it seems like https://github.com/microsoft/vscode-anycode is the extension that leverages tree-sitter.
Ah, OK, that makes sense. To summarize, some of the main reasons that article claims tree-sitter gives much better performance than LSP:
I still don't really like the idea of having to maintain two separate parsers in parallel, but I'm open to the possibility that it might be worth it.
Edited to add: Though I note that even https://github.com/microsoft/vscode-anycode says "This extension should be used when running in enviroments that don't allow for running actual language services."
https://www.masteringemacs.org/article/tree-sitter-complications-of-parsing-languages seems to have been written 3 or 4 years ago; I would be curious to learn what (if anything) has changed since then.
What would writing a tree-sitter grammar for Swarm look like? A few thoughts/notes from https://tree-sitter.github.io/tree-sitter/creating-parsers#writing-the-grammar :
Term
algebraic data type.try
which may cause issues, such as parsing a noop {}
and in parseStmt
.@byorgey actually is Swarm a LR(1) language? AFAIK the noop
and operator (+
/++
) cases can be resolved with one character lookahead.
I am not sure. I think so. But it's been a long time since I thought about various grammar classifications.
Is your feature request related to a problem? Please describe. The syntax highlighting support seems a bit fragile. While it seems to work, I wonder if it can be improved by using a tree-sitter grammar.
Describe the solution you'd like A tree-sitter grammar to be added to the swarm project by following https://tree-sitter.github.io/tree-sitter/creating-parsers. Then it should be integrated in https://github.com/emacs-tree-sitter/tree-sitter-langs/tree/599570cd2a6d1b43a109634896b5c52121e155e3/repos. For vim, swarm can provide a sample configuration to load the grammar: https://github.com/nvim-treesitter/nvim-treesitter#adding-parsers.
Describe alternatives you've considered https://www.masteringemacs.org/article/tree-sitter-complications-of-parsing-languages mentions CEDET, but that seems to be superseded by tree-sitter.