newren / git-filter-repo

Quickly rewrite git repository history (filter-branch replacement)
Other
8.55k stars 708 forks source link

git filter-repo --path fails on gnu coreutils repo. #600

Open gl-yziquel opened 2 months ago

gl-yziquel commented 2 months ago

Hi.

Version: git-filter-repo v2.45.0

git clone git://git.sv.gnu.org/coreutils.git
cd coreutils
git filter-repo --path src/chroot.c

Fails as follows:

Parsed 29641 commitsTraceback (most recent call last):
  File "/home/mini-me/.pyenv/versions/3.11.8/bin/git-filter-repo", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/mini-me/.pyenv/versions/3.11.8/lib/python3.11/site-packages/git_filter_repo.py", line 4032, in main
    filter.run()
  File "/home/mini-me/.pyenv/versions/3.11.8/lib/python3.11/site-packages/git_filter_repo.py", line 3967, in run
    self._parser.run(self._input, self._output)
  File "/home/mini-me/.pyenv/versions/3.11.8/lib/python3.11/site-packages/git_filter_repo.py", line 1418, in run
    self._parse_tag()
  File "/home/mini-me/.pyenv/versions/3.11.8/lib/python3.11/site-packages/git_filter_repo.py", line 1297, in _parse_tag
    (tagger_name, tagger_email, tagger_date) = self._parse_user(b'tagger')
                                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mini-me/.pyenv/versions/3.11.8/lib/python3.11/site-packages/git_filter_repo.py", line 1084, in _parse_user
    (name, email, when) = user_regex.match(self._currentline).groups()
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'groups'
fatal: stream ends early
fast-import: dumping crash report to .git/fast_import_crash_1886814

I suspect this is because the gnu coreutils repo has kind of a loooooong history going back to old version control systems. And, well, something is None at one point as a consequence of this troubled past.

Nonetheless, this points to some unwanted brittleness in git-filter-repo.

P.S.: thank you for that tool. It has proven to be invaluable to me.

gl-yziquel commented 2 months ago

I confirm that is an issue specific to the coreutils repository. The same operation succeeded on other repositories such as carving out cipd source code from google's luci-go repository. So, it is definitely some specific setup of coreutils that makes git filter-repo go nuts.

Mateossss280 commented 2 months ago

git clone git://git.sv.gnu.org/coreutils.git cd coreutils git filter-repo --path src/chroot.c

gl-yziquel commented 2 months ago

git clone git://git.sv.gnu.org/coreutils.git cd coreutils git filter-repo --path src/chroot.c

What is your point ?

Mateossss280 commented 2 months ago

Jeszcze nie wiem

newren commented 1 month ago

File "/home/mini-me/.pyenv/versions/3.11.8/lib/python3.11/site-packages/git_filter_repo.py", line 1297, in _parse_tag (tagger_name, tagger_email, tagger_date) = self._parse_user(b'tagger') ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/mini-me/.pyenv/versions/3.11.8/lib/python3.11/site-packages/git_filter_repo.py", line 1084, in _parse_user (name, email, when) = user_regex.match(self._currentline).groups() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'groups' fatal: stream ends early fast-import: dumping crash report to .git/fast_import_crash_1886814

Looks to me like a duplicate of https://github.com/newren/git-filter-repo/issues/263, except this time with tagger name rather than author name. Basically, corrupt history that git tends to just overlook. Does git fsck report the missing tagger for one of your tags in this repository? I'm guessing the same style workaround as used in the other report could probably be used on this repo as well, but you'd be replacing a tag object rather than a commit object.

gl-yziquel commented 1 month ago

Does git fsck report the missing tagger for one of your tags in this repository?

mini-me@virtucon ~/h/c/c/git (master)> git fsck
Checking object directories: 100% (256/256), done.
error in tag a6727941433ee1c91a20ede6cb381af1d18c566d: missingSpaceBeforeDate: invalid author/committer line - missing space before date
error in tag dea19e516ac764cb950ebf10b58820ca1f69c9bd: missingSpaceBeforeDate: invalid author/committer line - missing space before date
error in tag d8b73fb18f492a393167405f75781b2d4961ab20: missingSpaceBeforeDate: invalid author/committer line - missing space before date
error in tag fe2b74e040df3c3c0c21c702097432836a8181a2: missingSpaceBeforeDate: invalid author/committer line - missing space before date
error in tag 14f3c6d0962bd0c9413a847e18e3ff5322ee2ef4: missingSpaceBeforeDate: invalid author/committer line - missing space before date
error in tag ed6955c3f13cea73c7ae2bb93af010b84dbb92bb: missingSpaceBeforeDate: invalid author/committer line - missing space before date
error in tag c2666f9f7e130146349b581d38627c6351eb4746: missingSpaceBeforeDate: invalid author/committer line - missing space before date
error in tag a990adee5f2b409be5e9f7449a69f45b59f0546d: missingSpaceBeforeDate: invalid author/committer line - missing space before date
error in tag 05dd38e409028226e2173da3dfeffc91c66e3068: missingSpaceBeforeDate: invalid author/committer line - missing space before date
error in tag 7471311089aace5fed7286445c3e1a3737f9cf46: missingSpaceBeforeDate: invalid author/committer line - missing space before date
error in tag 7a3bc99a38742a096868942230247d6be7891e20: missingSpaceBeforeDate: invalid author/committer line - missing space before date
error in tag 2f746d50a2ca3497c77165a797a7ac96e5234677: missingSpaceBeforeDate: invalid author/committer line - missing space before date
error in tag 820eaccdd382807833a5a6e9898f581f2dd780d7: missingSpaceBeforeDate: invalid author/committer line - missing space before date
error in tag c8b7e54afa1e798c9cfe8ded9f19dfda5c868b59: missingSpaceBeforeDate: invalid author/committer line - missing space before date
error in tag 74d0400f45d3eb4f3978ddfdad49a685e8d1ef4e: missingSpaceBeforeDate: invalid author/committer line - missing space before date
error in tag 8db0289086c3d00806a7d242f67e59e13cc5c0fd: missingSpaceBeforeDate: invalid author/committer line - missing space before date
error in tag 0947bb3fee67cb3102a2399aa637e54c91ba2f88: missingSpaceBeforeDate: invalid author/committer line - missing space before date
error in tag 4296547b15320bb58de3528314d351218ffe57ba: missingSpaceBeforeDate: invalid author/committer line - missing space before date
error in tag 7cb5f9d5a5a77fd497b762e5bdb24d35a17515c2: missingSpaceBeforeDate: invalid author/committer line - missing space before date
error in tag 241e9ac9aff4ff2146ea9805ffe489fb7bb4a884: missingSpaceBeforeDate: invalid author/committer line - missing space before date
error in tag 7a28020ef79cbfd701e5d6c4c66b4ea5ff3b5443: missingSpaceBeforeDate: invalid author/committer line - missing space before date
error in tag 47ab287085225cb5210a6e45c00d469c009eb7e3: missingSpaceBeforeDate: invalid author/committer line - missing space before date
error in tag 824dd81a2c75dc65a3e781b6388772c1d473fcc2: missingSpaceBeforeDate: invalid author/committer line - missing space before date
error in tag 90a8d6e19e831674b648f9cd79cebcbe98e086b4: missingSpaceBeforeDate: invalid author/committer line - missing space before date
error in tag 551ac04bb0d00c14af3f98b3a2744e6e66aa0b3a: missingSpaceBeforeDate: invalid author/committer line - missing space before date
error in tag 9247b3ed1deeef381ef95058221e567eb9120020: missingSpaceBeforeDate: invalid author/committer line - missing space before date
error in tag 17ba86bfcc23d0ce6d43c41da4fb88c0fcaf54f1: missingSpaceBeforeDate: invalid author/committer line - missing space before date
error in tag 631792ec61c2f6be50afe0929e67cca5b128e8cf: missingSpaceBeforeDate: invalid author/committer line - missing space before date
error in tag c6f010cb5b2b3a6978f854e97cb40c37f9321406: missingSpaceBeforeDate: invalid author/committer line - missing space before date
error in tag aabd963243b80c375b38f7ddc6e4827bcd40de51: missingSpaceBeforeDate: invalid author/committer line - missing space before date

The first commit that is singled out is rather recent:

commit ac5213acbae8ab2e17589acc7c89b88b2d0e62ef (HEAD -> master, origin/master, origin/HEAD)
Author: Pádraig Brady <P@draigBrady.com>
Date:   Mon Sep 23 21:44:38 2024 +0100

But I don't quite understand what's wrong per se with the Date field.

newren commented 1 month ago

The first commit that is singled out is rather recent:

commit ac5213acbae8ab2e17589acc7c89b88b2d0e62ef (HEAD -> master, origin/master, origin/HEAD) Author: Pádraig Brady P@draigBrady.com Date: Mon Sep 23 21:44:38 2024 +0100 But I don't quite understand what's wrong per se with the Date field.

The fsck output didn't show any errors in any commits; it only pointed to errors in tags (unless there was more output from fsck that you didn't copy into this ticket). You'll want to look at the tag objects.

But I don't quite understand what's wrong per se with the Date field.

Well, you're looking at the wrong object (a commit rather than a tag), and you're also not looking at the object but at git log's pretty-printing of the object which can modify it a fair amount. Try instead git cat-file -p ${HASH_OF_INTERESTING_OBJECT}. For example, git cat-file -p a6727941433ee1c91a20ede6cb381af1d18c566d. (Technically, git cat-file -p also pretty-prints, but it only strips the object type and size from the tags & commits & blobs, leaving any of those types of objects basically verbatim. Trees are a different story -- those are modified more -- but no trees were identified as a problem by fsck.)

gl-yziquel commented 1 month ago

The fsck output didn't show any errors in any commits; it only pointed to errors in tags (unless there was more output from fsck that you didn't copy into this ticket). You'll want to look at the tag objects.

That was the full output.

newren commented 5 days ago

The fsck output didn't show any errors in any commits; it only pointed to errors in tags (unless there was more output from fsck that you didn't copy into this ticket). You'll want to look at the tag objects.

That was the full output.

Yes, and the fsck output only showed errors in tags, but for some reason you then added this:

commit ac5213acbae8ab2e17589acc7c89b88b2d0e62ef (HEAD -> master, origin/master, origin/HEAD)
Author: Pádraig Brady <P@draigBrady.com>
Date:   Mon Sep 23 21:44:38 2024 +0100

which is a commit, not a tag, so it has nothing to do with the fsck errors. You need to inspect the objects that fsck said failed, not some other random unrelated object. See my previous comment for how to investigate those objects.