`&diffexpr` does not specify which buffers are being diffed when more than 2 buffer are diffed together.

vim / vim

The official Vim repository

https://www.vim.org

Vim License

36.47k stars 5.45k forks source link

`&diffexpr` does not specify which buffers are being diffed when more than 2 buffer are diffed together. #9641

Open iago-lito opened 2 years ago

iago-lito commented 2 years ago

Is your feature request about something that is currently impossible or hard to do? Please describe the problem. According to :h diff-diffexpr, the variables v:fname_{in,new,out} are available within the custom diffing expression to find the files being currently processed on disk. Unfortunately, this information is not sufficient to find out which buffers actually correspond to in or new, especially when diffing more than 2 buffers together. This prevents me from customizing &diffexpr the way I want, with specific markers set into the various buffers to force alignment of specific lines.

Describe the solution you'd like New variables v:bufnr_in and v:bufnr_new would be available within the expression, pointing to the buffers currently being diffed.

Describe alternatives you've considered Instead of shelling out and performing I/O accesses, &diffexpr would directly invoke the internal diff engine in a custom way, with the information v:bufnr_in and v:bufnr_new still being available.

Additional context My ultimate intent is being able to mark specific lines when diffing two misaligned files A and B. E.g. after marking lines A:27 and B:44, the files would be split into three parts independently diffed (i.e. A:1-26 against B:1-43, A:27-27 against B:44-44 and A:28-end against B:45-end) and then the results concatenated into one single effective diff. This way, I would be able to "force" alignment of certain lines the way I want.

brammool commented 2 years ago

Is the diffing you want to do performed by an external program, or something you would do inside Vim? In the last case it might be useful to not write the text to files at all, and let the expression read the buffers directly. Not sure where the output would go, instead of a file it could be a list perhaps. This would use some other option than 'diffexpr', since Vim needs to know what to do.

chrisbra commented 2 years ago

slightly related to the second option: https://github.com/vim/vim/issues/4241 and https://github.com/vim/vim/issues/4191#issuecomment-478213781

iago-lito commented 2 years ago

@brammool Ideally it would be something I would do inside Vim. I have been falling back on external programs so far only because this is what &diffexpr is explicitly requiring. As you say, if the given expression is internal to Vim, then it is unsure what to do with the output. This is the reason why I think another option would be beneficial, so that Vim knows what to do with the output string.

In a nutshell, the new options would work exactly like &diffexpr, except that:

it would not read in and new from disk, but from the buffers instead.
it would not write out to disk, but return a string instead.
it would not execute an external command, but vimscript instead.

All in all, the new options would be more powerful, because the current behaviour of &diffexpr could very well be reimplemented in terms of the new option.

brammool commented 2 years ago

@brammool Ideally it would be something I would do inside Vim. I have been falling back on external programs so far only because this is what &diffexpr is explicitly requiring. As you say, if the given expression is internal to Vim, then it is unsure what to do with the output. This is the reason why I think another option would be beneficial, so that Vim knows what to do with the output string.

In a nutshell, the new options would work exactly like &diffexpr, except that:

it would not read in and new from disk, but from the buffers instead.

it would not write out to disk, but return a string instead.

it would not execute an external command, but vimscript instead.

All in all, the new options would be more powerful, because the current behaviour of &diffexpr could very well be reimplemented in terms of the new option.

We can also avoid generating the diff output and parsing it back. For every diff hunk we need:

line number in original
line count in original
line number in new file
line count in new file

Thus it would be a list of lists. When deleting line 4 and inserting a line below 8 you would have: [[4, 1, 4, 0], [8, 0, 7, 1]]

On the Vim side this isn't too difficult. Main work is to check the diff hunks are valid.

-- (letter from Mark to Mike, about the film's probable certificate) I would like to get back to the Censor and agree to lose the shits, take the odd Jesus Christ out and lose Oh fuck off, but to retain 'fart in your general direction', 'castanets of your testicles' and 'oral sex' and ask him for an 'A' rating on that basis. "Monty Python and the Holy Grail" PYTHON (MONTY) PICTURES LTD

/// Bram Moolenaar -- @.*** -- http://www.Moolenaar.net \\ /// \\ \\ sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ /// \\ help me help AIDS victims -- http://ICCF-Holland.org ///

iago-lito commented 2 years ago

When deleting line 4 and inserting a line below 8 you would have: [[4, 1, 4, 0], [8, 0, 7, 1]]

Oh, I understand that this is the internal representation of the diff actually used within vim when displaying the diffed buffers, right? Is this actually produced by internal calls to xdiff functions?

What about "changed" lines? Where can I find documentation about this format?

Are you suggesting that this structure be:

an input to the user-defined expression? In this case, I would be hooked too far down the process and I could not control how these hunks are calculated.
or the expected output of the user-defined expression? In this case, is there a vim API I could use to calculate valid hunks from strings? Something like xdiff(string_a, string_b, &diffopts)?. (As a reminder: my intent is to split the diffed buffers into contiguous pieces and diff them piece by piece instead of the whole buffers at once. So I would eventually need to produce a list of list of chunks and merge it into one consistent list of chunks by myself.)

In any case, the user would still be missing information about the buffers currently being diffed together (up to 8). So like the v:fname_{in,new,out} variables available to &diffexpr, the following variables would be available to the second option:

v:bufnr_original
v:bufnr_new

.. or v:bufnrs containing up to 8 [bufnr_1, bufnr_2, bufnr_3, ..], depending on how vim works with 3+way diffs.

brammool commented 2 years ago

When deleting line 4 and inserting a line below 8 you would have: [[4, 1, 4, 0], [8, 0, 7, 1]]

Oh, I understand that this is the internal representation of the diff actually used within vim when displaying the diffed buffers, right? Is this actually produced by internal calls to xdiff functions?

What about "changed" lines? Where can I find documentation about this format?

It's similar to "ed" diffs. If one line is changed both the old and new line count are the same: [4, 1, 4, 1]

Are you suggesting that this structure be:

an input to the user-defined expression? In this case, I would be hooked too far down the process and I could not control how these hunks are calculated.

or the expected output of the user-defined expression? In this case, is there a vim API I could use to calculate valid hunks from strings? Something like xdiff(string_a, string_b, &diffopts)?. (As a reminder: my intent is to split the diffed buffers into contiguous pieces and diff them piece by piece instead of the whole buffers at once. So I would eventually need to produce a list of list of chunks and merge it into one consistent list of chunks by myself.)

It's the output. You need to calculate it yourself, on a line-by-line basis. Changes within a line are not located.

In any case, the user would still be missing information about the buffers currently being diffed together (up to 8). So like the v:fname_{in,new,out} variables available to &diffexpr, the following variables would be available to the second option:

v:bufnr_original

v:bufnr_new

.. or v:bufnrs containing up to 8 [bufnr_1, bufnr_2, bufnr_3, ..], depending on how vim works with 3+way diffs.

It's probably better to have the user specify a function to call, like we have for other options. This function would be passed the list of buffer numbers, and return the list of lists with changes. We may also need to specify the tabpage, but most likely it's always the current tabpage.

-- To keep milk from turning sour: Keep it in the cow.

/// Bram Moolenaar -- @.*** -- http://www.Moolenaar.net \\ /// \\ \\ sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ /// \\ help me help AIDS victims -- http://ICCF-Holland.org ///

iago-lito commented 2 years ago

Okay great, it seems that we are converging towards the following (bikeshed-able) design, are we?

" Custom 'internal diff function'.
set idifffn=MyDiff
function MyDiff(buffnrs, tabpage)
    " Read from given buffers.
    " Calculate diffs the way we like.
    " Calculate and return lists of chunks:
    return [
        \ [chunk, chunk, chunk, ...], " corresponding to bufnrs[0]
        \ [chunk, chunk, ...], " corresponding to bufnrs[1]
        \ ...
        \ ]
endfunction

I suppose that option &idifffn, when set, would take precedence over &diffexpr if also set?

I find that there is still one weakness to the new option that makes it not easy/friendly to use. Consider &diffexpr for example. In :h diff-diffexpr, we can read that the following setting:

set diffexpr=MyDiff()
function MyDiff()
   let opt = ""
   if &diffopt =~ "icase"
     let opt = opt . "-i "
   endif
   if &diffopt =~ "iwhite"
     let opt = opt . "-b "
   endif
   silent execute "!diff -a --binary " . opt . v:fname_in . " " . v:fname_new .
  \  " > " . v:fname_out
   redraw!
endfunction

does "almost the same as 'diffexpr' being empty". It is easy to write because we delegate all the diffing logic to the external diff program.

As far as I am aware of, we could not write such a simple example with &idifffn yet, because there is no way to invoke Vim's internal diffing logic (i.e. producing valid chunks from strings) via vimscript API. I think this would be something worth adding along with the new option. Something like:

diffchunks(strings [, {opts}])                                *diffchunks()*
        Calculate diff chunks from up to 8 files given as a list of strings.
        The result is ready for use as output of |idifffn| expression.
        The diff options and algorithm used are the ones
        specified in |diffopt|, unless {opts} is given.