newren / git-filter-repo

Quickly rewrite git repository history (filter-branch replacement)
Other
8.15k stars 694 forks source link

Empty target repository after filtering a directory #294

Closed ichbinsteffen closed 2 months ago

ichbinsteffen commented 2 years ago

I used the following command:

git filter-repo --force --debug --source sourceRepo/mv/ --path sourceRepo/mv/DataSchema/ --target targetRepo/

This was the output:

ibs@PC:~/filter-repo$ git filter-repo --force --debug --source sourceRepo/mv/ --path sourceRepo/mv/DataSchema/ --target targetRepo/ [DEBUG] Passed arguments: Namespace(analyze=False, blob_callback=None, commit_callback=None, debug=True, dry_run=False, email_callback=None, filename_callback=None, force=True, help=False, inclusive=True, mailmap=None, max_blob_size=0, message_callback=None, name_callback=None, no_ff=False, partial=True, path_changes=[('filter', 'match', b'sourceRepo/mv/DataSchema/')], preserve_commit_encoding=False, preserve_commit_hashes=False, prune_degenerate='auto', prune_empty='auto', quiet=False, refname_callback=None, refs=['--all'], repack=True, replace_refs=None, replace_text=None, reset_callback=None, source=b'sourceRepo/mv/', state_branch=None, stdin=False, strip_blobs_with_ids=set(), subdirectory_filter=None, tag_callback=None, tag_rename=None, target=b'targetRepo/', to_subdirectory_filter=None, use_base_name=False, version=False) [DEBUG] Running: git -C sourceRepo/mv/ fast-export --show-original-ids --signed-tags=strip --tag-of-filtered-object=rewrite --fake-missing-tagger --reference-excluded-parents --use-done-feature --mark-tags --reencode=yes --all (saving a copy of the output at targetRepo/.git/filter-repo/fast-export.original) [DEBUG] Running: git -C targetRepo/ -c core.ignorecase=false fast-import --force --quiet (using the following file as input: targetRepo/.git/filter-repo/fast-export.filtered) Parsed 54680 commits New history written in 1331.36 seconds; now repacking/cleaning... [DEBUG] Running (in targetRepo/): git reset --hard [DEBUG] Running (in targetRepo/): git reflog expire --expire=now --all [DEBUG] Running (in targetRepo/): git gc --prune=now Enumerating objects: 1, done. Counting objects: 100% (1/1), done. Writing objects: 100% (1/1), done. Total 1 (delta 0), reused 0 (delta 0) Completely finished after 1334.14 seconds.

Looks good so far... but:

ibs@PC:~/filter-repo/targetRepo$ ls -A .git ibs@PC:~/filter-repo/targetRepo$

After running this, as you can see the target repository contained nothing but the ".git" folder. I ran into the same problem on windows and linux. The output above is from the past run under linux (wsl ubuntu, git 2.25.1). I can read into targetRepo/.git/filter-repo/fast-export.filtered (which is huge btw.) and see lots of the actual content source code of the repo, but thats about it. Git status reveals the obvious:

On branch master No commits yet

As empty as before. Am I doing something wrong? Why am I getting an empty repo?

newren commented 2 years ago

I think the problem here is that --path is supposed to operate on files as found in the git repository, not relative to whatever directory you original ran from (i.e. filenames that git [-C gitrepo] log --name-only would have shown or git [-C gitrepo] ls-files would have shown). But let me ask a few more detailed questions...

Is sourceRepo/mv/ actually a git repository (i.e. does sourceRepo/mv/.git exists)? If not, perhaps your --source should have been sourceRepo?

Also, I'm assuming targetRepo/ is a git repository, i.e. targetRepo/.git exists before your command.

Now, what kind of files exist under sourceRepo/mv/? (Or is it just sourceRepo/?) Do you actually expect to see files underneath that directory named sourceRepo/mv/DataSchema/...? If so, that would suggest that the directory you ran your filter-repo command from had files named sourceRepo/mv/sourceRepo/mv/DataSchema/... within them. The fact that targetRepo ends up empty suggests you don't have such paths.

johentsch commented 2 years ago

Edit: The workaround is not to use an absolute path for --target, as I first thought, but to copy the filtered repo containing sub1 before filtering out sub2.

I've run into the same trouble: The same command that worked perfectly last year all of a sudden resulted in an empty target. Even worse: Even without specifying a target repo, the path filter would not work.

Expected

Running git filter-repo --subdirectory-filter sub1/sub2 --target ../empty_repo within the source repo should result in the contents of sub2 ending up in the root of empty_repo. (@ichbinsteffen note that --subdirectory-filter is just sugar for combining --path with --path-rename so I think we're experiencing the same problem).

Bug

The output in the commandline looks perfectly normal and no error messages are shown, but empty_repo remains empty.

It happens using version 2.34.0

Troubleshooting & Workaround

Tested git filter-repo --subdirectory-filter sub1/sub2, i.e. without specifying a target, and it emptied the source repo completely, although sub2 was not empty! What did work was git filter-repo --subdirectory-filter sub1. However, git filter-repo --subdirectory-filter sub1 --target ../empty_repo didn't (specifying target as absolute path either).

So what I did to make it work:

Output of git filter-repo --subdirectory-filter sub2 --target ../empty_repo --debug after having run git filter-repo --subdirectory-filter sub1 (i.e. without copying the result, empty_repo stays empty):

[DEBUG] Passed arguments:
Namespace(analyze=False, report_dir=None, inclusive=True, path_changes=[('filter', 'match', b'sub2/'), ('rename', 'match', [b'sub2/', b''])], use_base_name=False, subdirectory_filter=None, to_subdirectory_filter=None, replace_text=None, max_blob_size=0, strip_blobs_with_ids=set(), tag_rename=None, replace_message=None, preserve_commit_hashes=False, preserve_commit_encoding=False, mailmap=None, replace_refs=None, prune_empty='auto', prune_degenerate='auto', no_ff=False, filename_callback=None, message_callback=None, name_callback=None, email_callback=None, refname_callback=None, blob_callback=None, commit_callback=None, tag_callback=None, reset_callback=None, source=None, target=b'../empty_repo', help=False, version=False, force=False, partial=True, refs=['--all'], dry_run=False, debug=True, state_branch=None, stdin=False, quiet=False, repack=True)
[DEBUG] Running: git fast-export --show-original-ids --signed-tags=strip --tag-of-filtered-object=rewrite --fake-missing-tagger --reference-excluded-parents --use-done-feature --mark-tags --reencode=yes --all
  (saving a copy of the output at ../empty_repo/.git/filter-repo/fast-export.original)
[DEBUG] Running: git -C ../empty_repo -c core.ignorecase=false fast-import --force --quiet --date-format=raw-permissive
  (using the following file as input: ../empty_repo/.git/filter-repo/fast-export.filtered)
Parsed 252 commits
New history written in 5.01 seconds; now repacking/cleaning...
[DEBUG] Running (in ../empty_repo): git reset --hard
[DEBUG] Running (in ../empty_repo): git reflog expire --expire=now --all
[DEBUG] Running (in ../empty_repo): git gc --prune=now
Enumerating objects: 3346, done.
Counting objects: 100% (3346/3346), done.
Delta compression using up to 12 threads
Compressing objects: 100% (622/622), done.
Writing objects: 100% (3346/3346), done.
Total 3346 (delta 2720), reused 3326 (delta 2714), pack-reused 0
Completely finished after 5.67 seconds.

Output of git filter-repo --subdirectory-filter sub2 --target ../empty_repo --debug after having run git filter-repo --subdirectory-filter sub1 and copied the result (workaround that works):

[DEBUG] Passed arguments:
Namespace(analyze=False, report_dir=None, inclusive=True, path_changes=[('filter', 'match', b'sub2/'), ('rename', 'match', [b'sub2/', b''])], use_base_name=False, subdirectory_filter=None, to_subdirectory_filter=None, replace_text=None, max_blob_size=0, strip_blobs_with_ids=set(), tag_rename=None, replace_message=None, preserve_commit_hashes=False, preserve_commit_encoding=False, mailmap=None, replace_refs=None, prune_empty='auto', prune_degenerate='auto', no_ff=False, filename_callback=None, message_callback=None, name_callback=None, email_callback=None, refname_callback=None, blob_callback=None, commit_callback=None, tag_callback=None, reset_callback=None, source=None, target=b'../empty_repo', help=False, version=False, force=False, partial=True, refs=['--all'], dry_run=False, debug=True, state_branch=None, stdin=False, quiet=False, repack=True)
[DEBUG] Running: git fast-export --show-original-ids --signed-tags=strip --tag-of-filtered-object=rewrite --fake-missing-tagger --reference-excluded-parents --use-done-feature --mark-tags --reencode=yes --all
  (saving a copy of the output at ../empty_repo/.git/filter-repo/fast-export.original)
[DEBUG] Running: git -C ../empty_repo -c core.ignorecase=false fast-import --force --quiet --date-format=raw-permissive
  (using the following file as input: ../empty_repo/.git/filter-repo/fast-export.filtered)
Parsed 253 commits
New history written in 16.63 seconds; now repacking/cleaning...
[DEBUG] Running (in ../empty_repo): git reset --hard
HEAD is now at 1234567 [LATEST COMMIT OF newSource]
[DEBUG] Running (in ../empty_repo): git reflog expire --expire=now --all
[DEBUG] Running (in ../empty_repo): git gc --prune=now
Enumerating objects: 162, done.
Counting objects: 100% (162/162), done.
Delta compression using up to 12 threads
Compressing objects: 100% (34/34), done.
Writing objects: 100% (162/162), done.
Total 162 (delta 131), reused 159 (delta 128), pack-reused 0
Completely finished after 16.70 seconds.

There is certainly something fishy here. Unfortunately I don't know what the last version of git-filter-repo was with which it still worked.

newren commented 2 years ago

@johentsch: That seems like a different issue to me; the original reporter clearly seemed to have problems specifying paths as stored in git based on using --source and --target and had no renaming going on. You had an example where you claimed you had problems without either --source or --target, but you were using path renaming. It's possible the issues are related, but we don't have much in the way of evidence and never got more info from @ichbinsteffen on their issue.

Could you open a separate issue for your problem? And let's focus on the "git filter-repo --subdirectory sub1/sub2" without the --source and --target arguments until we understand just that issue. Note, though, that I can't reproduce your problem. git filter-repo --subdirectory-filter sub1/sub2 works great for me on a repo where git log -- sub1/sub2 shows lots of commits, and will retain those commits in the filtered solution (and mostly just those commits, but likely also some merge commits needed to flesh out the full topology). It'd be great if you could link to an example repo and use exact commands (i.e. not "sub1/sub2" but the real directory names) and compare the output of "git rev-list --count HEAD -- sub1/sub2" before the filtering with "git rev-list --count HEAD" after filtering.

ichbinsteffen commented 2 years ago

Hey @newren, I'll report back on Tuesday. In the end I got it working somehow and the working script I use is on my computer at work. So I'll report back on what I did differently to get it to work as intended.

vunhatchuong commented 1 year ago

I have the same problem of empty repo when uses on Windows. Installed with pip in conda env. So I open up WSL2, install through apt and the same command works.

newren commented 2 months ago

@vunhatchuong :

I have the same problem of empty repo when uses on Windows. Installed with pip in conda env. So I open up WSL2, install through apt and the same command works.

That sounds like one of the PowerShell issues we have in various other open issues; PowerShell sounds somewhat problematic. Glad WSL2 worked for you, but we're taking this issue off-topic.

I'll report back on Tuesday. In the end I got it working somehow and the working script I use is on my computer at work. So I'll report back on what I did differently to get it to work as intended.

Not sure which Tuesday was intended, but I'm pretty sure there's no point after a few years continuing to wait. So let's close this one out.