latex-lsp / tree-sitter-latex

LaTeX grammar for tree-sitter
MIT License
102 stars 24 forks source link

Parse error when defining new environments #17

Closed Triton171 closed 2 years ago

Triton171 commented 2 years ago

When I try to define a custom environment, this definition is recognized as an error by treesitter which sometimes leads to the highlighting in the whole document being broken (I'm using Helix). Here is a minimal example of this happening:

\documentclass{article}

\newenvironment{mycenter}
{\begin{center}}
{\end{center}}

\begin{document}
\begin{mycenter}
    Test
\end{mycenter}
\end{document}

This is probably a general problem because in LaTex we may have invalid syntax locally as long as it becomes valid after expanding.

I'm new to treesitter so I'm not sure what would be the best way to solve this. Maybe we could have a node for command definitions inside which we allow invalid/incomplete syntax. Or maybe we allow incomplete syntax everywhere but only if there is no other way to match it (I think this should be doable with precedence but I'm not sure).

I can try to fix this but it'd be nice if someone with more experience could point me in the right direction first.

pfoerster commented 2 years ago

Thanks for the report.

At the moment, I am integrating an updated version into texlab, which is a part of the main repository (https://github.com/latex-lsp/texlab/tree/feature/tree-sitter/lib/tree-sitter-latex). This grammar handles this case in a better way. Inside environment definitions, the parser does not try to match environments (instead the \begin{center} is treated as a normal command. Note that there a lot of other changes made to simplify the usage inside texlab.

This is probably a general problem because in LaTex we may have invalid syntax locally as long as it becomes valid after expanding.

This one is very hard/impossible to fix so this parser generally uses a "best-effort" approach.

clason commented 2 years ago

Out of interest, are you planning on making the updated version available separately?

Triton171 commented 2 years ago

Thanks a lot, I'm definitely going to check it out, once the new version gets merged into texlab.

This one is very hard/impossible to fix so this parser generally uses a "best-effort" approach.

You're right, there is no way to correctly parse all valid LaTex code with tree-sitter. The new approach looks like it'll work for anything that you'd encounter in a regular LaTex document though, so that's really nice.

Out of interest, are you planning on making the updated version available separately?

In case you're thinking about it, I'm sure quite a few people (including myself) are interested in the tree-sitter grammar by itself (although I can understand if you don't want to maintain a separate repository).