Closed Aster89 closed 1 year ago
Probably this "bug" has no solution, unless we have a list of
\commands
and the number of arguments they take. Or we can have some heuristics.
I would not call this a bug; rather, it is a quite hard limitation. There's been some discussions on this topic earlier. I could not quite find the issue threads. However, I remember the main conclusions:
What we really want is a command text object that matches \cmd[opt1]...[optn]{arg1}{arg2}...{argn}
in a fully "semantic" manner. That is, we want to be aware of how many options there can be and how many arguments the command takes. However, as far as I know, there is no way to actually know this unless we 1) hard code every known/common command somehow, or 2) parse the preamble and packages for the relevant \newcommand
s and similar.
Doing 1) is possible, but it does imply a lot of work for maintaining the command data. And this is not something I want to do personally. And 2) seems even harder.
Instead, we/I decided to go for a more pragmatic choice: \cmd{....}{..}
should be recognized as a single command. More generally, \cmd
+ a group of [...]
+ a group of {...}
separated only by whitespace should be assumed to be a command with options and arguments. This is obviuosly wrong in many cases (as you already point out). However, every such assumption/approximation will be wrong in some way or another, and as far as I can tell, the current behaviour should be closer to the right behaviour than most other possibilities.
@lervag yep, I see that. That is why I proposed to rely on spaces. How many people would put one or more spaces in between two arguments of a command (without a line break first)?
Plus, we could have an option to govern this variation of a non perfect solution.
@lervag yep, I see that. That is why I proposed to rely on spaces. … Plus, we could have an option to govern this variation of a non perfect solution.
Ok, yes, that's a nice idea. The relevant code is here:
It would not be too hard to add an option to specify the rules here. Something like this:
*g:vimtex_parser_cmd_allow_spaces*
VimTeX has a command parser that is used e.g. for the command text object.
The parsing of a cmd relies on a simple heuristic that any command looks like
this:
\cmd<overlay>[opts]...{args}...
That is, a command is parsed "greedily" with the assumption that every
following `[opt]` and `{arg}` groups should be considered part of the
command.
This is clearly wrong in many situations, but it is the best we can do
without having knowledge about the particular commands we are parsing.
The behaviour is configurable, though, in the sense that we can specify how
whitespaces between the argument groups `{...}` behave. By default, we ignore
every whitespace character, including a single line break. If this option is
set to 0 or |v:false|, any whitespace will break the greedy parser. Thus, for
this command:
\foo{b|ar} {baz}
With the option set to |v:false| and the cursor at `|`, the parser would only
capture `\foo{bar}`.
Default value: v:true
How many people would put one or more spaces in between two arguments of a command (without a line break first)?
Heh, I've stopped thinking like that. It turns out, if I make assumptions on what someone will or will not do, I'm usually wrong. So, in cases like this, I try and consider the actual possible solutions and then choose one that seems most sensible. Right now, it seems you are making a sensible suggestion for improving things. :)
I would be happy to get some help with this text and how to present the behaviour.
@kiryph Do I remember correctly that you were part of the discussions earlier where the current behaviour was determined?
@lervag Unfortunately, I also do not have a clear memory about this anymore.
According to github search, the textobjects ic
and ac
were requested in issue #244 (2015) where you expressed some thoughts
[...] And you are right, it seems a good idea to add commands as text objects as well. The question is how this should work, especially for commands that take several arguments. I guess:
ac should be the entire command, including (all) arguments
ic should be only the content of the arguments, perhaps from (but excluding) the first { and to (excluding) the last }
I had requested surround-like behavour with dsc
and csc
in https://github.com/lervag/vimtex/issues/97 where I avoided this issue by considering only simple commands. For example \textcolor{red}{my text}
is a prime candidate for a more intelligent dsc
. The most likely intention is to change to my text
.
A vaguely related matter was about folding of certain commands, which has lead to a configuration dictionary where a user can specify how to fold certain commands (https://github.com/lervag/vimtex/issues/828):
:h g:vimtex_fold_types
[...]
<cmd_single> Fold long commands with a single argument. E.g.:
\hypersetup{ ---> \hypersetup{...}
option 1,
...,
option n
}
<cmd_single_opt> Fold commands that opens with a single long optional
argument that is followed by a short "real" argument.
E.g.:
\usepackage[ ---> \usepackage[...]{name}
option 1,
...,
option n
]{name}
<cmd_multi> Fold commands that start with a short regular argument
and continue with long optional and/or regular
arguments. E.g.:
\newcommand{\xx}[3]{ ---> \newcommand{\xx} ...
Hello #1, #2, and #3.
}
However, I agree that this is not an advisable route for the text objects, even though this dictionary could be exploited for the text objects.
If I recollect it correctly, I avoided the matter for the text objects based on targets.vim by relying on the existing VimTeX internal methods used for the standard text objects in https://github.com/lervag/vimtex/pull/1384 (not 100% certain about my recollection and I did not quickly find a discussion in the PR or related issues).
Back to the actual issue of ic
and ac
I think using the proposed way of formatting code to enable better text objects in VimTeX can be an improvement since a perfect solution is not possible:
\cmd{}{} % 2-args command \cmd{} {} % 1-arg command followed by braced stuff \cmd{} {} % 1-arg command followed by braced stuff \cmd{}% {} % 2-args command \cmd{} {} % 2-args command
However, I am not sure about all suggestions:
% cmd-cont.
which, however, clutters the code and hence is not great. \cmd{}
{} % 2-args command
I guess a first improvement can be indeed to consider {
as command termination combination (if on the same line).
In a new line indentation can be added as formatting and not as command termination. One could resolve this by getting the vim setting shiftwidth
to strip multiples of shiftwidth
to see if there is a single additional space in the new line to indicate the end of the command.
For example the user sets shiftwidth=2
and the newline has three spaces as leading whitespace
\begin{quote}
\cmd{1st arg}
{new block}
\end{quote}
then {new block}
would not be part of the command of the previous line.
The same example but with one space less
\begin{quote}
\cmd{1st arg}
{2nd arg}
\end{quote}
would mean the command has not ended and there is another mandatory argument {2nd arg}
.
However, I fear that autoformatting tools might remove the additional space when applied.
@lervag Unfortunately, I also do not have a clear memory about this anymore.
According to …
Thanks! Perhaps you did not have a clear memory, but at least you did a good job of tracing old discussions!
I had requested surround-like behavour with
dsc
andcsc
in #97 where I avoided this issue by considering only simple commands. For example\textcolor{red}{my text}
is a prime candidate for a more intelligentdsc
. The most likely intention is to change tomy text
.
Although this is slightly off topic, I find it interesting and want to make a comment: I would be very surprised if there are not many examples of commands where the same behaviour for dsc
would not be what you want. Still, I agree that your suggestion for dsc
behaviour may be the most pragmatic choice, at least in some sense. For sake of completeness: The current behaviour is to remove the command and delimiters if there is only a single argument, but to only delete the command when there are multiple arguments.
Back to the actual issue of
ic
andac
I agree with most of your comments. To be short; how about this as a proposed solution: No new option, instead, change the current behaviour to this (and properly document it):
\cmd{}{}...{} % n-args
\cmd{}{} {} % 2-args
\cmd{} {}{} % 1-arg
\cmd{}
{} % 2-args (when next line braces align according to indent rules)
\cmd{}
{} % 1-arg (if not)
If I remember correctly, indent rules are always to put the next {
on the same level as the previous lines indent. We can easily use the indent function VimtexIndent(lnum)
to calculate the expected indents.
However, I fear that autoformatting tools might remove the additional space when applied.
Yes; it is clear that the case with newlines is hard. Perhaps we can split the decision in two:
Do we agree that, for single line commands, the extra space can be used to end the greedy argument parser? I think this should be quite uncontroversial and would solve 90% of this issue.
Is the above suggestion OK for how to handle the newline case? If not, are there better ideas? Should we use an option to allow different styles?
How about putting it all in the hands of the user? From the core side, I would implement it something like this:
if ! call(g:vimtex_cmd_separator_func, [s:text_between(a:start_pos, l:open) ])
return {}
endif
We keep the current default with the change suggested in 1 from earlier and implement it with a function like this:
" If user returns v:false or 0, then the greedy parser is aborted.
function! MyCmdSeparatorRule(separator_string)
let l:lines = split(a:separator_string, "\n", v:true)
return empty(l:lines[0])
\ && len(l:lines) <= 2
\ && empty(substitute(l:lines[1], '\s\+', '', 'g'))
endfunction
This should give a lot of flexibility to the user; and if we give a simple example of how to customize, it should not be too hard.
I've implemented an initial version of this in the branch issue/2628. I've not opened a PR for it yet, but feel free to look into it/test it!
Sorry I've not undestood how to use it. I've tried with defining MyCmdSeparatorRule
in my vimrc together with let g:vimtex_parser_cmd_separator_check = 'MyCmdSeparatorRule'
, but I get errors telling me that 1 is an index out of range. Indeed, if len(l:lines) <= 2
is true, that doesnt guarantee that the second operand (to the second &&
is valid.
So, either I'm wrong and this is a bad idea, or we need some further work on explaining and on the documentation (non existant right now).
Can you show me the function you are using now that does not work?
Read the last line of this comment first.
No, I haven't written it. I picked yours:
function! MyCmdSeparatorRule(separator_string)
let l:lines = split(a:separator_string, "\n", v:true)
return empty(l:lines[0])
\ && len(l:lines) <= 2
\ && empty(substitute(l:lines[1], '\s\+', '', 'g'))
endfunction
defined the global variable
let g:vimtex_cmd_separator_func = 'MyCmdSeparatorRule'
and gave it a try by pressing dac
with the cursor on the x
in the text below
\documentclass{article}
\begin{document}
\textcolor{red} {caio}
\end{document}
Since it errored, I inserted echom '<' . a:separator_string . '>'
as a first line in MyCmdSeparatorRule
, and I saw it prints <>
when I try dac
.
That's from memory. But now that I've tried again, I can't reproduce it. In the sense that it doesn't error. dac
just makes the whole line empty. I'll give a look at it again with a fresher mind.
You've made some errors here, e.g. using a wrong option name. But I also made an error in the suggested function. This should work:
set nocompatible
set runtimepath^=~/.local/plugged/vimtex
set runtimepath+=~/.local/plugged/vimtex/after
filetype plugin indent on
syntax enable
nnoremap q :qall!<cr>
function! MyCmdSeparatorRule(separator_string)
let l:lines = split(a:separator_string, "\n", v:true)
return empty(l:lines[0])
\ && len(l:lines) <= 2
\ && (len(l:lines) == 1 || empty(substitute(l:lines[1], '\s\+', '', 'g')))
endfunction
let g:vimtex_parser_cmd_separator_check = 'MyCmdSeparatorRule'
silent edit test.tex
Now with nvim --clean -u test.vim
we should open test.tex
and we should have something closer to the desired behaviour.
So, as far as I can tell, my proposed solution should work. But we may want to consider a better name for the option and we need good documentation for this with one or more good examples.
Oh, so g:vimtex_parser_cmd_separator_check
is fed with the string between each adjacent portions of a command, which is named separator_string
. And it should return the boolean telling whether the calling code should proceed in parsing.
Ok, I see how the default implementation works and how the one in your last message works.
Looks good!
I've 1) changed the default to the one we've discussed so far and 2) created an initial version of the docs. Since this is a breaking change, I would appreciate some feedback on this before I merge.
@kiryph @Aster89
I tried, it looks ok. I mean, the default is fine, and the user can customize as they like.
So I guess the most important thing to review is the doc itself.
I'd go for
This option specifies the policy for deciding whether successive `[arg]`/`{arg}`s following
a `\command` are arguments to that `\command`.
In fact, parsing a LaTeX command without additional knowledge, is a hard problem.
When we read `\foo{bar}{baz}` — is `{baz}` going to be consumed as an argument
to `\foo`? The only way to know this is to read the definition of the `\foo` command/macro.
A pragmatic choice when we write a parser, therefore, is to rely on some heuristics and
common practises. This will never be perfect, but it can be good enough for practical use.
In VimTeX, the core heuristics are that a command will look like this:
>tex
\foo<overlay>[[opt]{arg}]...
\begin{name}<overlay>[[opt]{arg}]...
<
and the parser greedily swallows as many groups of `[opts]` and `{args}`
as possible as long as the function specified via this option returns true
for the text between successive such groups.
The default function will allow a line break and possibly white space on
the preceding line before a new group.
... and so on
Thanks! I've pushed with your suggested improvements to the docs. I appreciate the feedback; feel free to add additional feedback, I'll be happy to make further updates.
Description
Probably this "bug" has no solution, unless we have a list of
\commands
and the number of arguments they take. Or we can have some heuristics.Indeed, this is a command,
\textbf{url}
, followed by some braced\em
phasisedtext
,whereas this is one command, i.e. a link to
url
with text an\em
phasizedtext
, and nothings elseCurrently dac interprets both in the latter way, so it correctly deletes the whole thing in the latter case, but mistakenly deletes the whole thing in the former case too.
I'm not sure what the correct behavior should be. Maybe we shoud determine the number of arguments of a command by requiring that two following args are not separated by just whitespace? For instance, something like this could be accomplished,
Steps to reproduce
Put this text in a TeX file
then hit dac (cursor anywhere before the space).
Expected behavior
Only
\textbf{some bold}
should be deleted.Actual behavior
All the
\textbf{some bold} {\em some emphasis}
is deleted.Do you use a latexmkrc file?
No
VimtexInfo