newren / git-filter-repo

Quickly rewrite git repository history (filter-branch replacement)
Other
8.55k stars 708 forks source link

Renaming path and then renaming it back to the original name deletes path rather than renaming it (renaming to another path that exists has weird side effects) #569

Open gdlxn-ibm opened 5 months ago

gdlxn-ibm commented 5 months ago

Using the --replace-refs delete-no-add workaround from Bug #401, I've found a scenario where git filter-repo --path-rename <source-path>:<target-path> deletes the source path rather than renaming it to the target path as shown below.

Create a new Git repository

gdlxn@geoff-test4:~/test$ git init -b master test-repo
Initialized empty Git repository in /home/gdlxn/test/test-repo/.git/
gdlxn@geoff-test4:~/test$ cd test-repo/
gdlxn@geoff-test4:~/test/test-repo$ ls -al
total 12
drwxrwxr-x 3 gdlxn gdlxn 4096 Jun  7 00:14 .
drwxrwxr-x 3 gdlxn gdlxn 4096 Jun  7 00:14 ..
drwxrwxr-x 7 gdlxn gdlxn 4096 Jun  7 00:14 .git

Perform initial Git commit

gdlxn@geoff-test4:~/test/test-repo$ git commit --allow-empty --cleanup=verbatim -m "Initial commit"
[master (root-commit) d2c2b96] Initial commit
gdlxn@geoff-test4:~/test/test-repo$ ls -al
total 12
drwxrwxr-x 3 gdlxn gdlxn 4096 Jun  7 00:14 .
drwxrwxr-x 3 gdlxn gdlxn 4096 Jun  7 00:14 ..
drwxrwxr-x 8 gdlxn gdlxn 4096 Jun  7 00:17 .git

Create and commit file f1

gdlxn@geoff-test4:~/test/test-repo$ touch f1
gdlxn@geoff-test4:~/test/test-repo$ git add -A -f
gdlxn@geoff-test4:~/test/test-repo$ git commit --allow-empty --cleanup=verbatim -m "Add f1"
[master d595038] Add f1
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 f1
gdlxn@geoff-test4:~/test/test-repo$ ls -al
total 12
drwxrwxr-x 3 gdlxn gdlxn 4096 Jun  7 00:19 .
drwxrwxr-x 3 gdlxn gdlxn 4096 Jun  7 00:14 ..
drwxrwxr-x 8 gdlxn gdlxn 4096 Jun  7 00:19 .git
-rw-rw-r-- 1 gdlxn gdlxn    0 Jun  7 00:19 f1

Rename file f1 to f2

gdlxn@geoff-test4:~/test/test-repo$ git filter-repo --replace-refs delete-no-add --path-rename f1:f2 --force
Parsed 2 commits
New history written in 0.03 seconds; now repacking/cleaning...
Repacking your repo and cleaning out old unneeded objects
HEAD is now at 707eb11 Add f1
Enumerating objects: 5, done.
Counting objects: 100% (5/5), done.
Delta compression using up to 8 threads
Compressing objects: 100% (2/2), done.
Writing objects: 100% (5/5), done.
Total 5 (delta 0), reused 0 (delta 0), pack-reused 0
Completely finished after 0.12 seconds.
gdlxn@geoff-test4:~/test/test-repo$ ls -al
total 12
drwxrwxr-x 3 gdlxn gdlxn 4096 Jun  7 00:21 .
drwxrwxr-x 3 gdlxn gdlxn 4096 Jun  7 00:14 ..
drwxrwxr-x 9 gdlxn gdlxn 4096 Jun  7 00:21 .git
-rw-rw-r-- 1 gdlxn gdlxn    0 Jun  7 00:21 f2

Create and commit new f1 file

gdlxn@geoff-test4:~/test/test-repo$ touch f1
gdlxn@geoff-test4:~/test/test-repo$ git add -A -f
gdlxn@geoff-test4:~/test/test-repo$ git commit --allow-empty --cleanup=verbatim -m "Add new f1"
[master 3a36c64] Add new f1
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 f1
gdlxn@geoff-test4:~/test/test-repo$ ls -al
total 12
drwxrwxr-x 3 gdlxn gdlxn 4096 Jun  7 00:23 .
drwxrwxr-x 3 gdlxn gdlxn 4096 Jun  7 00:14 ..
drwxrwxr-x 9 gdlxn gdlxn 4096 Jun  7 00:23 .git
-rw-rw-r-- 1 gdlxn gdlxn    0 Jun  7 00:23 f1
-rw-rw-r-- 1 gdlxn gdlxn    0 Jun  7 00:21 f2

Delete and commit new f1 file

gdlxn@geoff-test4:~/test/test-repo$ rm f1
gdlxn@geoff-test4:~/test/test-repo$ git add -A -f
gdlxn@geoff-test4:~/test/test-repo$ git commit --allow-empty --cleanup=verbatim -m "Delete new f1"
[master 7c3c32c] Delete new f1
 1 file changed, 0 insertions(+), 0 deletions(-)
 delete mode 100644 f1
gdlxn@geoff-test4:~/test/test-repo$ ls -al
total 12
drwxrwxr-x 3 gdlxn gdlxn 4096 Jun  7 00:25 .
drwxrwxr-x 3 gdlxn gdlxn 4096 Jun  7 00:14 ..
drwxrwxr-x 9 gdlxn gdlxn 4096 Jun  7 00:25 .git
-rw-rw-r-- 1 gdlxn gdlxn    0 Jun  7 00:21 f2

Rename file f2 to f1

gdlxn@geoff-test4:~/test/test-repo$ git filter-repo --replace-refs delete-no-add --path-rename f2:f1 --force
Parsed 4 commits
New history written in 0.03 seconds; now repacking/cleaning...
Repacking your repo and cleaning out old unneeded objects
HEAD is now at 243c0f1 Delete new f1
Enumerating objects: 7, done.
Counting objects: 100% (7/7), done.
Delta compression using up to 8 threads
Compressing objects: 100% (4/4), done.
Writing objects: 100% (7/7), done.
Total 7 (delta 2), reused 2 (delta 0), pack-reused 0
Completely finished after 0.14 seconds.
gdlxn@geoff-test4:~/test/test-repo$ ls -al
total 12
drwxrwxr-x 3 gdlxn gdlxn 4096 Jun  7 00:27 .
drwxrwxr-x 3 gdlxn gdlxn 4096 Jun  7 00:14 ..
drwxrwxr-x 9 gdlxn gdlxn 4096 Jun  7 00:27 .git

Note that git filter-repo --replace-refs delete-no-add --path-rename f2:f1 --force incorrectly deleted file f2 rather than renaming it to f1

gdlxn-ibm commented 5 months ago

This problem looks to be an example of "unexpected" behavior that can occur when the new name in --path-rename <old_name:new_name> already exists in the commit history.

In the above scenario, the path rename of f1 to f2 works as expected as f2 does not exist in the commit history at the time of the path rename. It simply replaces f1 with f2 in the second commit.

On the other hand, the path rename of f2 to f1 show "unexpected" behavior as f1 already exists in commits 3 and 4 at the time of the path rename. The path rename replaces f2 with f1 in the second commit, which then "collides" with the existing f1 in commits 3 and 4. In particular, the f1 delete in commit 4 causes f1 to be deleted.

@newren I'm not exactly sure what you intend to happen when the new name in --path-rename <old_name:new_name> already exists in the commit history, though I suspect that it is working "as designed". Because of the potential for unexpected behavior, would it be possible to have --path-rename issue a warning message if the new name already exists in the commit history? I would also suggest updating the --path-rename documentation in the user manual with a warning about possible unexpected behavior when new name already exists in the commit history.

newren commented 4 months ago

Yeah, working as expected. Unfortunately, attempting to determine if the new pathnames are in the commit history, within a relevant range of commits, is computationally quite expensive and can become challenging in terms of tracking the topology correctly. This can be even worse since the renaming can happen with --path-rename or --paths-from-file or --filename-callback and might be a prefix, or a glob, or a regex, or some more general function making it hard to even enumerate all the new names. I added a very simplistic check (basically only triggering if there was a commit which modified both the old and new filenames when users tried to rename one to the other -- in that case filter-repo will print a "File renaming caused colliding pathnames" error and abort). That can sometimes help users, but it leaves a lot of cases out because I don't think a more general one is really feasible.

I'd be very happy to take a patch to add a warning to the manual.