lervag / vimtex

VimTeX: A modern Vim and neovim filetype plugin for LaTeX files.
MIT License
5.53k stars 391 forks source link

cmd get_current returns empty if optional arguments are after mandatory ones #1983

Closed poetaman closed 3 years ago

poetaman commented 3 years ago

vimtex#cmd#get_current() explained in comment: https://github.com/lervag/vimtex/issues/1981#issuecomment-792263781, returns an empty value if optional argument is after curly argument, unless the name of the macro is \begin.

Why distinguish between color of optional arguments of \begin, and those of \setmainfont. Both are just macros for TeX/LaTeX. Perhaps this is rooted in the fact that earlier commands in TeX world required optional argument to be before mandatory ones. That restriction does not exist for modern key-value based commands, of which there are many... including fontspec macros. Most of the fontspec code online uses this syntax pattern: optional font parameters after font name. This will also truly resolve https://github.com/lervag/vimtex/issues/1981

Screen Shot 2021-03-08 at 12 53 34 PM

I came across this problem while after coding a function to determine the name of the current command/enviornment. Put the following in your vimrc file, move the cursor to parts of macro arguments (in a .tex file), hit <C-r>, and read messages with :messages

function! s:printVimTeXScope()
    echomsg "==== PrintVimTeXScope ===="
    let l:cmd = vimtex#cmd#get_current()
    if !empty(l:cmd)
        echomsg "cmd:" . l:cmd.name[1:]
        if l:cmd.name[1:] ==# "begin"
            let [l:open, l:close] = vimtex#delim#get_surrounding('env_tex')
            if !empty(l:open)
                echomsg "env:" . l:open.name
            else
                echomsg "ERROR: env empty"
            endif
        endif
    else
        echomsg "ERROR: cmd empty"
    endif
    return ""
endfunction

inoremap <buffer> <expr> <C-r> <SID>printVimTeXScope()

Sample code block from above screenshot: (as you will notice, pressing <C-r> inside [] of \setmainfont will return ERROR: cmd empty)

\setmainfont{LucidaBrightOT}[
    Extension = .otf,
    Path = fonts/LucidaSans/,
    UprightFont = *,
    BoldFont = *-Demi,
    ItalicFont = *-Italic,
    BoldItalicFont = *-DemiItalic
]

\begin{helloworld}[
    Extension = .otf,
    Path = fonts/LucidaSans/,
    UprightFont = *,
    BoldFont = *-Demi,
    ItalicFont = *-Italic,
    BoldItalicFont = *-DemiItalic
]
lervag commented 3 years ago

It would help if you can be more clear what you think is wrong and how you think it should be fixed. As it stands, I have to guess. Even though I probably guess right, it is much better if I don't need to guess.

My guess: You think I should update vimtex#cmd#get_current() (and related functions) to also allow options after arguments as a general rule. I.e., to relax the current implementation to gobble any number arguments/options regardless of the order in which they appear. So, given \cmd[a]{b}[c], this should give result in the parsed command containing both a and c in the opts list.

Note: As explained, it is really hard or impossible to actually determine if something is an actual argument/option group or not, so we need to specify a behaviour that works as expected nearly always, but that we must allow to probably fail in some cases. We won't achieve perfect, so let's figure out the next best thing. I think I'm pretty close, but I'll accept that the current behaviour might need an update, but this would require a slightly clearer description and explanation.

poetaman commented 3 years ago

Yes, I think you got it right :) Will keep a note on adding more exact note on expected behavior in future. Yes this syntax \cmd[a]{b}[c], with optional arguments at either/both ends is a good place to begin. Calling vimtex#cmd#get_current() while cursor is in any of those arguments should return the command/macro name \cmd

P.S. I have seen macros with optional arguments on both ends of mandatory arguments like \cmd[a]{b}[c]. Though don't remember seeing optional arguments to appear anywhere between positional arguments.

lervag commented 3 years ago

The question is: should we allow spaces between }{, ]{, }[ and, ][? I think doing so would lead to more false positives, but if we don't allow them, it will lead to a lot of failed parses. An example to show what I mean with false positives:

$\frac{1}{2} {\color{red} a}$ % how to prevent the last {...} to be recognized as an argument?

Whatever we do, it will end up being wrong in some way. Still, if vimtex#cmd#get_*() simply catches the largest possible command atom, then it will be the user of the APIs problem to deal with the result. Thus, when we work with a parsed \frac command, we can safely ignore any erroneously catched arguments.

The current behaviour allows any number of whitespace characters, including newlines.

My current idea is that it seems relatively safe to adjust the behaviour as follows:

  1. Match command name \word
  2. If \begin, then match the name argument
  3. Match an overlay argument (for beamer) <>
  4. Greedily match any number of {...} and [...] that follows, in any order.

I think a possible alternative is to adjust step 4 to greedily match first [...], then {...}, then again [...]. The only differen ce from the current behaviour would in this case be the very final matching of the [...] at the end.

poetaman commented 3 years ago

@lervag Good point, I think it boils down to this rule: Starting from the macro name \<somemacro>, keep including {.*},[.*],(.*) option groups till the opening delimiter {,[,( of some group ([],{},()) does not "touch" the closing delimiter of last included option group.

What does "touch" mean? If they (opening delimiter of potential option group, and closing one of last option group) are right next to each other on same line, i.e there is no space in between, then they "touch". Or if they are on different lines, and all lines in between end with comment % and just have white space/comments in before we reach the opening delimiter, then too they "touch".

$\frac{1}{2} {\color{red} a}$ % how to prevent the last {...} to be recognized as an argument? The above rule should cover this, and all other such cases... Whitespace between macros argument delimiters on same line does not make sense (it does seem to compile for some macros), but is just a bad coding style for TeX as one cannot decipher the meaning by looking at TeX code... So it makes little sense to support that, IMHO.

I did some testing on a sample tex file I have that has textpos environment (note the parentheses delimiters for last argument):

\begin{textblock*}{}[,](,)

and it seems like spaces are allowed between some of the arguments of textblock even when they are on same line. Though if you ask TeX experts on https://tex.stackexchange.com, I bet they will recommend against writing such code (as it lends easily itself to ambivalence).

This pattern compiles:

\begin{textblock*}{3in}[0,0]%
    %   
    %   
    (3in,1in)%

and this too:

\begin{textblock*}{3in}[0,0]%
    %   
    %   
    (3in,%
      1in)%

But this does not compile (as atleast one line does not end with % )

\begin{textblock*}{3in}[0,0]
    %   
    %   
    (3in,%
    1in)%

Feel free to ask more questions, and you can also refer this on https://tex.stackexchange.com to see what TeX experts have to say about my proposal.

poetaman commented 3 years ago

And, regarding 2. If \begin, then match the name argument: In the current API there is a way to distinguish whether cursor is in command or environment, hopefully that will remain that way (just confirming). That's because there are environments & commands that have same name & do same thing; good packages provide that for cases where a property is applied on a small piece of text & its too much verbosity to have an environment around such text.

Lastly, it will be a great addition of capability if the the API can give one more information: The number of argument cursor is in. For instance in a macro command like this: \begin{textblock*}{3in}[0,0](3in,1in), it will return 1 when cursor is in {textblock*}, 2 when cursor is in {3in}, 3 when cursor is in [0,0], and 4 when cursor is in (3in,1in). This will enable users to write functions that do specific things based on combination of position & delimiter type (we already know the type of delimiter I guess, looking at table returned by the current API). In my case, it will help me switch completion dictionary from argument to argument (because not all arguments accept same type of values).

Update (from prior observation): Also, this example \begin{textblock*}{3in}[0,0](3in,1in) shows we can have optional arguments mixed in between mandatory ones. Something I didn't think about in my previous comment in this thread: https://github.com/lervag/vimtex/issues/1983#issuecomment-793101475 (check the PostScript at the bottom of that comment).

lervag commented 3 years ago

First, a minor comment: it is a relatively large mental effort to parse big "walls of text", and since I can't focus on a single issue for a lot of time in one go, this can become a barrier for me wrt. focus on actually implementing anything. So, to get progress here, it would be very beneficial to get a short and concise summary of what we want done here. If it is a short proposal that makes sense, then we can also allow others to comment on it before I act.

So, I propose the following: we finish this discussion with the goal of having a short and simple definition of where we want to go, then we open a new issue where this is clearly stated where we/I can invite for some more comments before I implement the changes. I want to do this carefully, because I don't want to break things for anyone, and we are now discussing a relatively deep point in the VimTeX APIs.

So, my current idea of this thread is the following:

Feature request

vimtex#cmd#get*() is a family of commands to parse a LaTeX command. The parser is generic and greedy, and it does not guarantee that it parses the "right amount", but a core idea is that the matched content should always contain the actual command. As it stands, the matching is not good enough, because it is restricted to the following pattern \cmd[...]*{...}*. Instead, we should apply the following pattern: \cmd\([...]\|{...}\|(...)\)*.

We should also adhere to the following special cases and considerations:

The latter requirement is actually a little bit complicated. The current behaviour does not allow newlines.

poetaman commented 3 years ago

@lervag Sure, its good idea to have a proposal ticket with just a clear proposal!

Regarding your special considerations:

The latter requirement is actually a little bit complicated. The current behaviour does not allow newlines.

Yes I understand, it could be staggered, and perhaps lower priority. Most of the well formatted code I come across on https://tex.stackexchange.com/, (which also happens to be intuitive) looks like code below, where opening delimiter of new argument and closing delimiter of previous argument are on same line and have no whitespace in between (like }[ in code below). And you already support this (for \begin environments at least). So just changing the implementation to \cmd\([...]\|{...}\|(...)\|<...>\)*, and adding an attribute that conveys what argument number cursor is in might be good enough.

\begin{helloworld}[
    Extension = .otf,
    Path = fonts/LucidaSans/,
    UprightFont = *,
    BoldFont = *-Demi,
    ItalicFont = *-Italic,
    BoldItalicFont = *-DemiItalic
]
lervag commented 3 years ago

Very sorry about lack of progress here; it's simply been to much the last half year so this did not get high enough priority. But I finally got around to it and I've opened a PR. Let's continue the discussion there: #2197