yitzchak / tree-sitter-latex

LaTeX grammar for tree-sitter
MIT License
11 stars 5 forks source link

Matching environment names #1

Open Aerijo opened 5 years ago

Aerijo commented 5 years ago

In the TextMate grammar, we only match the end delim if it shares the exact same name as the start (although, we also restrict to word characters only). With Tree-sitter, I thought it'd be a great idea to carry this over. Until I didn't.

The reasoning being that we can now write a mini parser for the contents of the environment name, so we can reasonably match things like foo and f{o}o (or even

\begin{fo%
         o}

)

With the extra power though, we don't have the ability to know when environment names match when command sequences are used. One possibility is to again restrict to word characters. The other is to not match names (as this package currently does). Not matching is actually reasonable, because the job of alerting users to a mistake can be done by a linter, which has much more flexibility in knowing what (simple) commands expand to.

However, I'm currently of the mind to match "simple" names at the grammar level, and assume that "complex" names are a match. E.g.,

\begin{foo} \end{bar} % marked as invalid by grammar
\begin{foo} \end{\bar} % accepted by grammar, possibly picked up by linter

I've got the base code for this (adapted from tree-sitter-html), so it would really be just some fixes and a PR. This issue is to get thoughts on if the grammar should do this at all, or stay as it is.

yitzchak commented 5 years ago

Thanks for working this out!

I am currently leaning towards leaving this up to a linter, but would I definitely like to revisit the issue once the grammar is more complete. Because LaTeX is not context free I have the feeling that things could get very complex fast. Therefore I am hesitant to make the parser too restrictive or complex.

FYI: I have only basic support for a few special commands and \makeatletter, but not \ExplSyntaxOn yet. I have a partially working language-latex on my fork's add-tree-sitter branch. You'll have to remove firstLineMatch from latex.cson to make it work.

Aerijo commented 5 years ago

@yitzchak The makeatletter behaviour is something I once wanted, but then changed my mind on. It's unreliable, highly prone to users redefining their own commands for it, and flat out doesn't work in file types where it's a letter by default. I think it's better to assume @ is always a word character.

Edit: plus, it cuts the complexity in half; no more text_group AND text_group_at business.

Aerijo commented 5 years ago

@yitzchak It's currently very restrictive on environment names matching, requiring they match the end or the entire block is considered an error (though still parsed). It also fails completely to allow control sequences, making the entire block an error if one is used in the environment name.

From my point of view trying to use the parse tree, the fewer false errors that need special handling the better. For example, it might be fine to error with a simple plain text value in each delim, but as soon as control sequences or multiple lines are used it should assume the user knows what they are doing and leave it to the linter.

(E.g. multilines:

\begin{%
docu%
ment%
}

is perfectly valid and should not error.

yitzchak commented 5 years ago

I agree the environment names need a little work. This is also legal syntax, BTW.

\begin f
bar wibble
\end f