newren / git-filter-repo

Quickly rewrite git repository history (filter-branch replacement)
Other
8.52k stars 708 forks source link

Question: Does git-filter-repo fix badDate error? #420

Closed ppi-agray closed 4 months ago

ppi-agray commented 1 year ago

When we run git fsck, we get an error that we are trying to fix:

$ git fsck
Checking object directories: 100% (256/256), done.
error in commit 60707e738f5b4330147fff34d7ddc734eea4a577: missingSpaceBeforeDate: invalid author/committer line - missing space before date
error in commit c60d233e8296f9c7a1f6e50719e59bac6fdd272f: badDate: invalid author/committer line - bad date
Checking objects: 100% (163912/163912), done.

We've tried everything and no luck. We cannot fix this.
For instance, if we try git rebase --interactive 'c60d233e8296f9c7a1f6e50719e59bac6fdd272f^'

I don't see c60d23 in the list of commits to edit, so there is nothing for me to "fix", since the commit in question is not on that list.

We are wondering if this tool can fix this, and if so how? Thanks!!!

me-and commented 1 year ago

I don't believe there's any function in git-filter-repo that will fix this for you.

If you knew how to fix this problem for a single commit, git-filter-repo might be useful for applying the same change to a lot of commits at once, but it sounds like you have a single duff commit that's causing problems elsewhere, rather than a lot of duff commits that need fixing in bulk.

If the problem is an open source repository, I'll happily take a look to see if I can sort it out for you; this one's a new problem to me, but I've a reasonable idea how I'd go about sorting it. If it's closed-source, I won't do it for free, but I'd potentially be willing to sign an NDA to have a look on a no-fix-no-fee basis. Either is clearly off-topic for this project, so if you want to do that, drop me an email: my full name's on my profile page and my email address is <firstname>@<lastname>.org.

newren commented 1 year ago

For instance, if we try git rebase --interactive 'c60d233e8296f9c7a1f6e50719e59bac6fdd272f^'

Yeah, I wouldn't expect that to work in general. It presumes that c60d233e8296f9c7a1f6e50719e59bac6fdd272f is an ancestor of your current commit. Also, rebase can only ever fix one branch even if that commit is an ancestor of your current commit, whereas you really need to fix all branches/tags that rely upon this commit.

We are wondering if this tool can fix this, and if so how?

filter-repo doesn't have anything to directly fix this; it currently dies if the dates don't parse. But it might be useful as part of a solution, in particular with replace objects.

I made a simple reproduction of the first error with a new repo with just one commit. In it, a git fsck shows me:

$ git fsck
error in commit 166f57b3fbe31257100361ecaf735f305b533b21: missingSpaceBeforeDate: invalid author/committer line - missing space before date
Checking object directories: 100% (256/256), done.

I tried running git fast-export --no-walk --reference-excluded-parents --no-data 166f57b3fbe31257100361ecaf735f305b533b21 but found it just preserved the bad date information and piping that to fast-import will make it crash. Using filter-repo will cause it to choke on the same bad line from fast-export that fast-import chokes on.

However...

If I run git cat-file -p 166f57b3fbe31257100361ecaf735f305b533b21 >tmp, then I get a copy of the bad commit. I can edit it to add the necessary spaces, e.g. changing something like

tree e1d871155fce791680ec899fe7869067f2b4ffd2
author My Name <my@email.com>1673287380 -0800
committer My Name <my@email.com> 1673287380 -0800

Initial

into

tree e1d871155fce791680ec899fe7869067f2b4ffd2
author My Name <my@email.com> 1673287380 -0800
committer My Name <my@email.com> 1673287380 -0800

Initial

(notice the space added after the author email before the number representing the unix epoch). Then I can do the following:

$ git replace -f 166f57b3fbe31257100361ecaf735f305b533b21 $(git hash-object -t commit -w tmp)
$ git filter-repo --force
$ git commit-graph write

After this sequence, I see:

$ git fsck
Checking object directories: 100% (256/256), done.
Checking objects: 100% (3/3), done.
Verifying commits in commit graph: 100% (1/1), done.

In your case, you'd have two commits to fix, so I'd expect it to look something like this:

$ git cat-file -p 60707e738f5b4330147fff34d7ddc734eea4a577 >tmp1
$ git cat-file -p c60d233e8296f9c7a1f6e50719e59bac6fdd272f >tmp2

Then edit both tmp1 and tmp2 to fix the dates so they no longer have their original respective problems. Then run:

$ git replace -f 60707e738f5b4330147fff34d7ddc734eea4a577 $(git hash-object -t commit -w tmp1)
$ git replace -f c60d233e8296f9c7a1f6e50719e59bac6fdd272f $(git hash-object -t commit -w tmp2)
$ git filter-repo --force
$ git commit-graph write

However, note that this will rewrite all commits that depend upon either of those two bad commits and give them all new commit hashes. So this is rewriting history, and if it's public and shared history, you've got a flag day event to deal with. That's unavoidable if you want to fix historical bad commits, but you do need to make sure you understand the ramifications with rewriting history, force pushing, getting everyone who has a copy to discard and get the new repo, etc.

Anyway, hope that helps. If any of that doesn't make sense or you want someone who can look at your repository specifically, you may want to take up @me-and on their offer.

newren commented 4 months ago

No response, and I'm pretty sure my suggestion above was sufficient to fix the issue. Let's close this one out.