dac is fooled by braced text following the command

lervag / vimtex

VimTeX: A modern Vim and neovim filetype plugin for LaTeX files.

MIT License

5.55k stars 391 forks source link

dac is fooled by braced text following the command #2628

Closed Aster89 closed 1 year ago

Aster89 commented 1 year ago

Description

Probably this "bug" has no solution, unless we have a list of \commands and the number of arguments they take. Or we can have some heuristics.

Indeed, this is a command, \textbf{url}, followed by some braced \emphasised text,

\textbf{url} {\em text}

whereas this is one command, i.e. a link to url with text an \emphasized text, and nothings else

\href{url}{\em text}

Currently dac interprets both in the latter way, so it correctly deletes the whole thing in the latter case, but mistakenly deletes the whole thing in the former case too.

I'm not sure what the correct behavior should be. Maybe we shoud determine the number of arguments of a command by requiring that two following args are not separated by just whitespace? For instance, something like this could be accomplished,

\cmd{}{} % 2-args command
\cmd{} {} % 1-arg command followed by braced stuff
    \cmd{}
    {} % 1-arg command followed by braced stuff
    \cmd{}%
    {} % 2-args command
    \cmd{}
{} % 2-args command

Steps to reproduce

Put this text in a TeX file

\textbf{some bold} {\em some emphasis}

then hit dac (cursor anywhere before the space).

Expected behavior

Only \textbf{some bold} should be deleted.

Actual behavior

All the \textbf{some bold} {\em some emphasis} is deleted.

Do you use a latexmkrc file?

VimtexInfo

System info:
  OS: Arch Linux
  Vim version: VIM 9.0 (1-1182)
  Has clientserver: true
  Servername: VIM1

VimTeX project: main
  base: main.tex
  root: /home/enrico/the-unfolding-of-language
  tex: /home/enrico/the-unfolding-of-language/main.tex
  main parser: fallback
  document class: book
  packages: amsbsy amsgen amsmath amsopn amstext array atbegshi-ltx atveryend-ltx auxhook babel biblatex bigintcalc bitset blx-case-expl3 booktabs cancel caption caption3 cleveref csquotes datatool-base datatool-fp defpattern emptypage environ epigraph epstopdf-base etexcmds etoolbox expl3 fancyhdr fontenc fontspec fontspec-luatex fp fp-addons fp-basic fp-eqn fp-eval fp-exp fp-pas fp-random fp-snap fp-trigo fp-upn gettitlestring glossaries glossaries-compatible-307 glossary-hypernav glossary-list glossary-long glossary-super glossary-tree graphics graphicx hycolor hyperref ifluatex ifoddpage iftex ifthen ifxetex imakeidx indentfirst infwarerr inputenc intcalc keyval kvdefinekeys kvoptions kvsetkeys l3keys2e letltxmacro lipsum listings logreq longtable lstmisc ltcaption ltxcmds makecell mfirstuc multicol nameref nextpage ninecolors pdfcol pdfescape pdftexcmds pgf pgfcomp-version-0-65 pgfcomp-version-1-18 pgfcore pgffor pgfkeys pgfmath pgfrcs pgfsys postnotes quoting refcount rerunfilecheck shellesc siunitx spath3 stringenc subfig substr supertabular tabularray tabularx tcolorbox textcase tikz tikzpagenodes tracklang translations translator trig trimspaces uniquecounter url verbatim xcolor xfor xfp xkeyval xparse
  source files:
    main.tex
    src/packages.tex
    src/glossary.tex
    src/custom-commands.tex
    ../../../usr/share/texmf-dist/tex/latex/tools/.tex
    src/introduction.tex
    src/chapters/a-castle-in-the-air.tex
    src/chapters/perpetual-motion.tex
    src/chapters/the-forces-of-destruction.tex
    src/chapters/a-reef-of-dead-metaphors.tex
    src/chapters/the-forces-of-creation.tex
    src/chapters/craving-for-order.tex
    src/chapters/the-unfolding-of-language.tex
    src/epilogue.tex
    img/munintumaa.tex
    src/appendixes/flipping-categories.tex
    src/appendixes/laryngeals-again.tex
    src/appendixes/the-devil-in-the-detail.tex
    src/appendixes/the-cook-s-counterpoint.tex
    src/appendixes/the-turkish-mirror.tex
  compiler: latexmk
    engine: -pdf
    options:
      -shell-escape
      -verbose
      -synctex=1
      -interaction=nonstopmode
    callback: 1
    continuous: 1
    executable: latexmk
  viewer: Zathura
    xwin id: 0
  qf method: LaTeX logfile using pplatex

lervag commented 1 year ago

Probably this "bug" has no solution, unless we have a list of \commands and the number of arguments they take. Or we can have some heuristics.

I would not call this a bug; rather, it is a quite hard limitation. There's been some discussions on this topic earlier. I could not quite find the issue threads. However, I remember the main conclusions:

What we really want is a command text object that matches \cmd[opt1]...[optn]{arg1}{arg2}...{argn} in a fully "semantic" manner. That is, we want to be aware of how many options there can be and how many arguments the command takes. However, as far as I know, there is no way to actually know this unless we 1) hard code every known/common command somehow, or 2) parse the preamble and packages for the relevant \newcommands and similar.

Doing 1) is possible, but it does imply a lot of work for maintaining the command data. And this is not something I want to do personally. And 2) seems even harder.

Instead, we/I decided to go for a more pragmatic choice: \cmd{....}{..} should be recognized as a single command. More generally, \cmd + a group of [...] + a group of {...} separated only by whitespace should be assumed to be a command with options and arguments. This is obviuosly wrong in many cases (as you already point out). However, every such assumption/approximation will be wrong in some way or another, and as far as I can tell, the current behaviour should be closer to the right behaviour than most other possibilities.

Aster89 commented 1 year ago

@lervag yep, I see that. That is why I proposed to rely on spaces. How many people would put one or more spaces in between two arguments of a command (without a line break first)?

Plus, we could have an option to govern this variation of a non perfect solution.

lervag commented 1 year ago

@lervag yep, I see that. That is why I proposed to rely on spaces. … Plus, we could have an option to govern this variation of a non perfect solution.

Ok, yes, that's a nice idea. The relevant code is here:

https://github.com/lervag/vimtex/blob/8ef5b0d9b85e7bb57b00b4571db4804aec0e10eb/autoload/vimtex/cmd.vim#L691-L702

It would not be too hard to add an option to specify the rules here. Something like this:

*g:vimtex_parser_cmd_allow_spaces*
  VimTeX has a command parser that is used e.g. for the command text object.
  The parsing of a cmd relies on a simple heuristic that any command looks like
  this:

    \cmd<overlay>[opts]...{args}...

  That is, a command is parsed "greedily" with the assumption that every
  following `[opt]` and `{arg}` groups should be considered part of the
  command.

  This is clearly wrong in many situations, but it is the best we can do
  without having knowledge about the particular commands we are parsing.

  The behaviour is configurable, though, in the sense that we can specify how
  whitespaces between the argument groups `{...}` behave. By default, we ignore
  every whitespace character, including a single line break. If this option is
  set to 0 or |v:false|, any whitespace will break the greedy parser. Thus, for
  this command:

    \foo{b|ar} {baz}

  With the option set to |v:false| and the cursor at `|`, the parser would only
  capture `\foo{bar}`.

  Default value: v:true

How many people would put one or more spaces in between two arguments of a command (without a line break first)?

Heh, I've stopped thinking like that. It turns out, if I make assumptions on what someone will or will not do, I'm usually wrong. So, in cases like this, I try and consider the actual possible solutions and then choose one that seems most sensible. Right now, it seems you are making a sensible suggestion for improving things. :)

lervag commented 1 year ago

I would be happy to get some help with this text and how to present the behaviour.

@kiryph Do I remember correctly that you were part of the discussions earlier where the current behaviour was determined?

kiryph commented 1 year ago

@lervag Unfortunately, I also do not have a clear memory about this anymore.

According to github search, the textobjects ic and ac were requested in issue #244 (2015) where you expressed some thoughts

[...] And you are right, it seems a good idea to add commands as text objects as well. The question is how this should work, especially for commands that take several arguments. I guess:

ac should be the entire command, including (all) arguments

ic should be only the content of the arguments, perhaps from (but excluding) the first { and to (excluding) the last }

I had requested surround-like behavour with dsc and csc in https://github.com/lervag/vimtex/issues/97 where I avoided this issue by considering only simple commands. For example \textcolor{red}{my text} is a prime candidate for a more intelligent dsc. The most likely intention is to change to my text.

A vaguely related matter was about folding of certain commands, which has lead to a configuration dictionary where a user can specify how to fold certain commands (https://github.com/lervag/vimtex/issues/828):

:h g:vimtex_fold_types

[...]
    <cmd_single>        Fold long commands with a single argument. E.g.:  

                          \hypersetup{    --->    \hypersetup{...}
                            option 1,
                            ...,
                            option n
                            }

    <cmd_single_opt>    Fold commands that opens with a single long optional
                        argument that is followed by a short "real" argument.
                        E.g.:  

                          \usepackage[    --->    \usepackage[...]{name}
                            option 1,
                            ...,
                            option n
                            ]{name}

    <cmd_multi>         Fold commands that start with a short regular argument
                        and continue with long optional and/or regular
                        arguments. E.g.:  

                          \newcommand{\xx}[3]{    --->    \newcommand{\xx} ...
                            Hello #1, #2, and #3.
                          }

However, I agree that this is not an advisable route for the text objects, even though this dictionary could be exploited for the text objects.

If I recollect it correctly, I avoided the matter for the text objects based on targets.vim by relying on the existing VimTeX internal methods used for the standard text objects in https://github.com/lervag/vimtex/pull/1384 (not 100% certain about my recollection and I did not quickly find a discussion in the PR or related issues).

Back to the actual issue of ic and ac

I think using the proposed way of formatting code to enable better text objects in VimTeX can be an improvement since a perfect solution is not possible:

\cmd{}{} % 2-args command
\cmd{} {} % 1-arg command followed by braced stuff
   \cmd{}
   {} % 1-arg command followed by braced stuff
   \cmd{}%
   {} % 2-args command
   \cmd{}
{} % 2-args command

However, I am not sure about all suggestions:

Using the comment sign to indicate that the command does not end means a user is not allowed to add a personal comment at the end of a line where this is not the case. One could add a special comment % cmd-cont. which, however, clutters the code and hence is not great.
I can imagine that the last example is not always a 2-args command.
```
   \cmd{}
{} % 2-args command
```
I also doubt that tools like latexindent.pl play nice here. But I would leave this to another issue.

I guess a first improvement can be indeed to consider { as command termination combination (if on the same line).

In a new line indentation can be added as formatting and not as command termination. One could resolve this by getting the vim setting shiftwidth to strip multiples of shiftwidth to see if there is a single additional space in the new line to indicate the end of the command.

For example the user sets shiftwidth=2 and the newline has three spaces as leading whitespace

\begin{quote}
  \cmd{1st arg}
   {new block}
\end{quote}

then {new block} would not be part of the command of the previous line. The same example but with one space less

\begin{quote}
  \cmd{1st arg}
  {2nd arg}
\end{quote}

would mean the command has not ended and there is another mandatory argument {2nd arg}.

However, I fear that autoformatting tools might remove the additional space when applied.

lervag commented 1 year ago

@lervag Unfortunately, I also do not have a clear memory about this anymore.

According to …

Thanks! Perhaps you did not have a clear memory, but at least you did a good job of tracing old discussions!

I had requested surround-like behavour with dsc and csc in #97 where I avoided this issue by considering only simple commands. For example \textcolor{red}{my text} is a prime candidate for a more intelligent dsc. The most likely intention is to change to my text.

Although this is slightly off topic, I find it interesting and want to make a comment: I would be very surprised if there are not many examples of commands where the same behaviour for dsc would not be what you want. Still, I agree that your suggestion for dsc behaviour may be the most pragmatic choice, at least in some sense. For sake of completeness: The current behaviour is to remove the command and delimiters if there is only a single argument, but to only delete the command when there are multiple arguments.

Back to the actual issue of ic and ac

I agree with most of your comments. To be short; how about this as a proposed solution: No new option, instead, change the current behaviour to this (and properly document it):

\cmd{}{}...{} % n-args
\cmd{}{} {}   % 2-args
\cmd{} {}{}   % 1-arg
\cmd{}
{}            % 2-args (when next line braces align according to indent rules)
\cmd{}
  {}          % 1-arg (if not)

If I remember correctly, indent rules are always to put the next { on the same level as the previous lines indent. We can easily use the indent function VimtexIndent(lnum) to calculate the expected indents.

However, I fear that autoformatting tools might remove the additional space when applied.

Yes; it is clear that the case with newlines is hard. Perhaps we can split the decision in two:

Do we agree that, for single line commands, the extra space can be used to end the greedy argument parser? I think this should be quite uncontroversial and would solve 90% of this issue.
Is the above suggestion OK for how to handle the newline case? If not, are there better ideas? Should we use an option to allow different styles?

Aster89 commented 1 year ago

I agree with the one-line cases, and I too think that it's unconstroversial
I hadn't thought about external formatters in the first place, so I have a less clear idea, but... one way to wash our hands of it is to let the user specify the regex of the substring that ends the greedy argument parser. The problem would be that we can't tell if the user inserted an invalid regex, but this is ultimately up to the user, no?

lervag commented 1 year ago

How about putting it all in the hands of the user? From the core side, I would implement it something like this:

if ! call(g:vimtex_cmd_separator_func, [s:text_between(a:start_pos, l:open) ])
  return {}
endif

We keep the current default with the change suggested in 1 from earlier and implement it with a function like this:

" If user returns v:false or 0, then the greedy parser is aborted.
function! MyCmdSeparatorRule(separator_string)
  let l:lines = split(a:separator_string, "\n", v:true)

  return empty(l:lines[0])
        \ && len(l:lines) <= 2
        \ && empty(substitute(l:lines[1], '\s\+', '', 'g'))
endfunction

This should give a lot of flexibility to the user; and if we give a simple example of how to customize, it should not be too hard.

lervag commented 1 year ago

I've implemented an initial version of this in the branch issue/2628. I've not opened a PR for it yet, but feel free to look into it/test it!

Aster89 commented 1 year ago

Sorry I've not undestood how to use it. I've tried with defining MyCmdSeparatorRule in my vimrc together with let g:vimtex_parser_cmd_separator_check = 'MyCmdSeparatorRule', but I get errors telling me that 1 is an index out of range. Indeed, if len(l:lines) <= 2 is true, that doesnt guarantee that the second operand (to the second && is valid.

lervag commented 1 year ago

So, either I'm wrong and this is a bad idea, or we need some further work on explaining and on the documentation (non existant right now).

Can you show me the function you are using now that does not work?

Aster89 commented 1 year ago

Read the last line of this comment first.

No, I haven't written it. I picked yours:

function! MyCmdSeparatorRule(separator_string)
  let l:lines = split(a:separator_string, "\n", v:true)

  return empty(l:lines[0])
        \ && len(l:lines) <= 2
        \ && empty(substitute(l:lines[1], '\s\+', '', 'g'))
endfunction

defined the global variable

let g:vimtex_cmd_separator_func = 'MyCmdSeparatorRule'

and gave it a try by pressing dac with the cursor on the x in the text below

\documentclass{article}

\begin{document}
\textcolor{red} {caio}
\end{document}

Since it errored, I inserted echom '<' . a:separator_string . '>' as a first line in MyCmdSeparatorRule, and I saw it prints <> when I try dac.

That's from memory. But now that I've tried again, I can't reproduce it. In the sense that it doesn't error. dac just makes the whole line empty. I'll give a look at it again with a fresher mind.

lervag commented 1 year ago

You've made some errors here, e.g. using a wrong option name. But I also made an error in the suggested function. This should work:

set nocompatible
set runtimepath^=~/.local/plugged/vimtex
set runtimepath+=~/.local/plugged/vimtex/after
filetype plugin indent on
syntax enable

nnoremap q :qall!<cr>

function! MyCmdSeparatorRule(separator_string)
  let l:lines = split(a:separator_string, "\n", v:true)

  return empty(l:lines[0])
        \ && len(l:lines) <= 2
        \ && (len(l:lines) == 1 || empty(substitute(l:lines[1], '\s\+', '', 'g')))
endfunction

let g:vimtex_parser_cmd_separator_check = 'MyCmdSeparatorRule'

silent edit test.tex

Now with nvim --clean -u test.vim we should open test.tex and we should have something closer to the desired behaviour.

lervag commented 1 year ago

So, as far as I can tell, my proposed solution should work. But we may want to consider a better name for the option and we need good documentation for this with one or more good examples.

Aster89 commented 1 year ago

Oh, so g:vimtex_parser_cmd_separator_check is fed with the string between each adjacent portions of a command, which is named separator_string. And it should return the boolean telling whether the calling code should proceed in parsing.

Ok, I see how the default implementation works and how the one in your last message works.

Looks good!

lervag commented 1 year ago

I've 1) changed the default to the one we've discussed so far and 2) created an initial version of the docs. Since this is a breaking change, I would appreciate some feedback on this before I merge.

@kiryph @Aster89

Aster89 commented 1 year ago

I tried, it looks ok. I mean, the default is fine, and the user can customize as they like.

So I guess the most important thing to review is the doc itself.

I'd go for

This option specifies the policy for deciding whether successive `[arg]`/`{arg}`s following
a `\command` are arguments to that `\command`.

In fact, parsing a LaTeX command without additional knowledge, is a hard problem.
When we read `\foo{bar}{baz}` — is `{baz}` going to be consumed as an argument
to `\foo`? The only way to know this is to read the definition of the `\foo` command/macro.

A pragmatic choice when we write a parser, therefore, is to rely on some heuristics and
common practises. This will never be perfect, but it can be good enough for practical use.
In VimTeX, the core heuristics are that a command will look like this:
>tex
    \foo<overlay>[[opt]{arg}]...
    \begin{name}<overlay>[[opt]{arg}]...
<
and the parser greedily swallows as many groups of `[opts]` and `{args}`
as possible as long as the function specified via this option returns true
for the text between successive such groups.

The default function will allow a line break and possibly white space on
the preceding line before a new group.

... and so on

lervag commented 1 year ago

Thanks! I've pushed with your suggested improvements to the docs. I appreciate the feedback; feel free to add additional feedback, I'll be happy to make further updates.