radicle-dev / radicle-git

Everything Radicle growing around Git
Other
35 stars 5 forks source link

add diff_file method for Repository #106

Closed keepsimple1 closed 1 year ago

keepsimple1 commented 1 year ago

This is to solve issue #40 .

This patch turns out larger than I expected as I ended up to refactor the FileDiff type. The serialization of Diff is kept compatible.

A new branch diff-test for git-platinum was created for testing diff serialization.

keepsimple1 commented 1 year ago
  1. The diff contents and kinds can be mismatched, e.g. if I match on the copied kind and contents then I have to ignore binary or plain because they wouldn't make any sense.

As Git detects moved / copied based on a similarity threshold that can be < 100%, the DiffContent might not always be empty for the copied kind.

I did a test locally with git, and saw that happening:
(readme3.md is a new file copied off and modified a bit, then I used the threshold of 90% for copy detection).

$ diff emoji.txt readme3.md
2a3
> a line.

$ git log --summary -C90 -1
commit f2c4e04263ea182f7a811bfc9a82addab62870e5 (HEAD -> diff-test)
Author: Han Xu <keepsimple@gmail.com>
Date:   Tue Jan 31 05:14:20 2023 -0800

    readme3.md

 copy emoji.txt => readme3.md (94%)

(Btw, I had to enable copy detection via git config --global --add diff.renames copy)

keepsimple1 commented 1 year ago

2. The added, deleted, etc. methods return FileDiff which if matched on have variants that need to be ignored.

Fixed this one by changing FileDiff into an enum and removed DiffKind. Now they return the types Added, Deleted, etc.

keepsimple1 commented 1 year ago

3. I know EofNewLine is existing code, but I feel like it can happen with added files too. Something to look into.

EofNewLine is a bit confusing to me. I'd like to investigate this in a separate PR.

FintanH commented 1 year ago
  1. The diff contents and kinds can be mismatched, e.g. if I match on the copied kind and contents then I have to ignore binary or plain because they wouldn't make any sense.

As Git detects moved / copied based on a similarity threshold that can be < 100%, the DiffContent might not always be empty for the copied kind.

I did a test locally with git, and saw that happening: (readme3.md is a new file copied off and modified a bit, then I used the threshold of 90% for copy detection).

$ diff emoji.txt readme3.md
2a3
> a line.

$ git log --summary -C90 -1
commit f2c4e04263ea182f7a811bfc9a82addab62870e5 (HEAD -> diff-test)
Author: Han Xu <keepsimple@gmail.com>
Date:   Tue Jan 31 05:14:20 2023 -0800

    readme3.md

 copy emoji.txt => readme3.md (94%)

(Btw, I had to enable copy detection via git config --global --add diff.renames copy)

Great point, and nice find :)