newren / git-filter-repo

Quickly rewrite git repository history (filter-branch replacement)
Other
8.48k stars 708 forks source link

attempting to remove file the command never completes using git-filter-repo 2.38.0 #421

Open jerryduvalEPS opened 1 year ago

jerryduvalEPS commented 1 year ago

I did a fresh clone then ran command below

testpace % git filter-repo --invert-paths --path "lib/workbench/workbench.war"

immediately it says

Parsed 1 commits

Then continues to run forever, never stopping.

If I manually kill the execution it says

^CTraceback (most recent call last): File "/usr/local/bin/git-filter-repo", line 4005, in main() File "/usr/local/bin/git-filter-repo", line 4002, in main filter.run() File "/usr/local/bin/git-filter-repo", line 3937, in run self._parser.run(self._input, self._output) File "/usr/local/bin/git-filter-repo", line 1409, in run self._parse_commit() File "/usr/local/bin/git-filter-repo", line 1260, in _parse_commit self._commit_callback(commit, aux_info) File "/usr/local/bin/git-filter-repo", line 3486, in _tweak_commit self._record_remapping(commit, orig_parents) File "/usr/local/bin/git-filter-repo", line 3261, in _record_remapping self._output.flush() KeyboardInterrupt

I'm running on macOS Big Sur

jerryduvalEPS commented 1 year ago

I ran it with debug output, might help

jerryduval@Jerrys-Macbook-Pro-2 testpace % git filter-repo --debug --invert-paths --path lib/workbench/workbench.war [DEBUG] Passed arguments: Namespace(analyze=False, report_dir=None, inclusive=False, path_changes=[('filter', 'match', b'lib/workbench/workbench.war')], use_base_name=False, subdirectory_filter=None, to_subdirectory_filter=None, replace_text=None, max_blob_size=0, strip_blobs_with_ids=set(), tag_rename=None, replace_message=None, preserve_commit_hashes=False, preserve_commit_encoding=False, mailmap=None, replace_refs=None, prune_empty='auto', prune_degenerate='auto', no_ff=False, filename_callback=None, message_callback=None, name_callback=None, email_callback=None, refname_callback=None, blob_callback=None, commit_callback=None, tag_callback=None, reset_callback=None, source=None, target=None, help=False, version=False, force=False, partial=False, refs=['--all'], dry_run=False, debug=True, state_branch=None, stdin=False, quiet=False, repack=True) [DEBUG] Migrating refs/remotes/origin/ -> refs/heads/ [DEBUG] Removing 'origin' remote (rewritten history will no longer be related; consider re-pushing it elsewhere. [DEBUG] Running: git fast-export --show-original-ids --signed-tags=strip --tag-of-filtered-object=rewrite --fake-missing-tagger --reference-excluded-parents --no-data --use-done-feature --reencode=yes --all (saving a copy of the output at .git/filter-repo/fast-export.original) [DEBUG] Running: git -c core.ignorecase=false fast-import --force --quiet (using the following file as input: .git/filter-repo/fast-export.filtered) Parsed 1 commits

newren commented 1 year ago

File "/usr/local/bin/git-filter-repo", line 3261, in _record_remapping self._output.flush()

Oh boy, buffering problems. If you change the following line of code (from the _record_remapping() function, and yes I know it comes after the line you that shows up in the backtrace):

      self._flush_renames(None, limit=40)

to use a limit of 20 or 10 instead of 40, does that help? What if you run on a different OS, such as linux?

If neither of the above work, is the repository in question available for me to clone and look at?

jerryduvalEPS commented 1 year ago

I was able to get it working fine on Linux. Thanks for the help.

newren commented 1 year ago

Ok, glad it worked for you on Linux. I am still curious, though...

It appears from some searches that Linux has a pipe buffer size of 65536 bytes, whereas Mac OS X likely defaults to 16384 bytes but sometimes might be as small as a single system page depending on system load, meaning only 4096 bytes. (See https://unix.stackexchange.com/questions/11946/how-big-is-the-pipe-buffer/11954#11954 for all these claims). I wonder if you were somehow stuck with a single page, though even then I thought the code would have handled it. I think I remembered as I was writing the code that some OSes (AIX?) had retardedly small buffers down to 4K. I thought I had done good enough back-of-the-envelope calculations to ensure my stuff would work even on that size (it really shouldn't ever need to buffer more than 41*81 = 3321 bytes at the most at one time), so I'm curious whether my calculations were wrong somehow or if your OS has a buffer even more constrained than that.

Any chance you could try changing the line of code that reads

      self._flush_renames(None, limit=40)

to use a value of 20 instead of 40 and see if that works on Mac OS X? (If not, don't worry about it, I'm mostly just curious at this point.)

jerryduvalEPS commented 1 year ago

setting it to 20 worked on mac