Open Aerijo opened 5 years ago
Thanks for working this out!
I am currently leaning towards leaving this up to a linter, but would I definitely like to revisit the issue once the grammar is more complete. Because LaTeX is not context free I have the feeling that things could get very complex fast. Therefore I am hesitant to make the parser too restrictive or complex.
FYI: I have only basic support for a few special commands and \makeatletter
, but not \ExplSyntaxOn
yet. I have a partially working language-latex
on my fork's add-tree-sitter branch. You'll have to remove firstLineMatch
from latex.cson
to make it work.
@yitzchak The makeatletter
behaviour is something I once wanted, but then changed my mind on. It's unreliable, highly prone to users redefining their own commands for it, and flat out doesn't work in file types where it's a letter by default. I think it's better to assume @
is always a word character.
Edit: plus, it cuts the complexity in half; no more text_group
AND text_group_at
business.
@yitzchak It's currently very restrictive on environment names matching, requiring they match the end or the entire block is considered an error (though still parsed). It also fails completely to allow control sequences, making the entire block an error if one is used in the environment name.
From my point of view trying to use the parse tree, the fewer false errors that need special handling the better. For example, it might be fine to error with a simple plain text value in each delim, but as soon as control sequences or multiple lines are used it should assume the user knows what they are doing and leave it to the linter.
(E.g. multilines:
\begin{%
docu%
ment%
}
is perfectly valid and should not error.
I agree the environment names need a little work. This is also legal syntax, BTW.
\begin f
bar wibble
\end f
In the TextMate grammar, we only match the end delim if it shares the exact same name as the start (although, we also restrict to word characters only). With Tree-sitter, I thought it'd be a great idea to carry this over. Until I didn't.
The reasoning being that we can now write a mini parser for the contents of the environment name, so we can reasonably match things like
foo
andf{o}o
(or even)
With the extra power though, we don't have the ability to know when environment names match when command sequences are used. One possibility is to again restrict to word characters. The other is to not match names (as this package currently does). Not matching is actually reasonable, because the job of alerting users to a mistake can be done by a linter, which has much more flexibility in knowing what (simple) commands expand to.
However, I'm currently of the mind to match "simple" names at the grammar level, and assume that "complex" names are a match. E.g.,
I've got the base code for this (adapted from
tree-sitter-html
), so it would really be just some fixes and a PR. This issue is to get thoughts on if the grammar should do this at all, or stay as it is.