rtyley / bfg-repo-cleaner

Removes large or troublesome blobs like git-filter-branch does, but faster. And written in Scala
https://rtyley.github.io/bfg-repo-cleaner/
GNU General Public License v3.0
11.07k stars 547 forks source link

Jgit MissingObjectException when deleting files #38

Open sgandon opened 10 years ago

sgandon commented 10 years ago

Hello Roberto, I have an issue when trying the following command line

java -jar ../../talend-svn-git-migration/bfg-1.11.2.jar --delete-files '{*.jar,*.zip,*.war,*.jpg,*.png,*.jpeg}'

I have run bfg on it before and done the necessary cleaning that you mention in you web site : $ git reflog expire --expire=now --all $ git gc --prune=now --aggressive

but I get the following error :

This repo has been processed by The BFG before! Will prune repo before proceeding - to avoid unnecessary cleaning work on unused objects...
Exception in thread "main" org.eclipse.jgit.api.errors.JGitInternalException: Garbage collection failed.
        at org.eclipse.jgit.api.GarbageCollectCommand.call(GarbageCollectCommand.java:126)
        at com.madgag.git.bfg.cli.Main$$anonfun$1.apply(Main.scala:49)
        at com.madgag.git.bfg.cli.Main$$anonfun$1.apply(Main.scala:34)
        at scala.Option.map(Option.scala:145)
        at com.madgag.git.bfg.cli.Main$delayedInit$body.apply(Main.scala:33)
        at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
        at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
        at scala.App$$anonfun$main$1.apply(App.scala:71)
        at scala.App$$anonfun$main$1.apply(App.scala:71)
        at scala.collection.immutable.List.foreach(List.scala:318)
        at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32)
        at scala.App$class.main(App.scala:71)
        at com.madgag.git.bfg.cli.Main$.main(Main.scala:27)
        at com.madgag.git.bfg.cli.Main.main(Main.scala)
Caused by: org.eclipse.jgit.errors.MissingObjectException: Missing unknown ad8d81b862528c5cacf247e2dc4d71e4ef1a3cb8
        at org.eclipse.jgit.internal.storage.file.WindowCursor.open(WindowCursor.java:148)
        at org.eclipse.jgit.lib.ObjectReader$1.open(ObjectReader.java:302)
        at org.eclipse.jgit.revwalk.RevWalk$2.next(RevWalk.java:921)
        at org.eclipse.jgit.internal.storage.pack.PackWriter.findObjectsToPack(PackWriter.java:1698)
        at org.eclipse.jgit.internal.storage.pack.PackWriter.preparePack(PackWriter.java:797)
        at org.eclipse.jgit.internal.storage.pack.PackWriter.preparePack(PackWriter.java:760)
        at org.eclipse.jgit.internal.storage.file.GC.writePack(GC.java:675)
        at org.eclipse.jgit.internal.storage.file.GC.repack(GC.java:531)
        at org.eclipse.jgit.internal.storage.file.GC.gc(GC.java:164)
        at org.eclipse.jgit.api.GarbageCollectCommand.call(GarbageCollectCommand.java:123)
        ... 13 more
rtyley commented 10 years ago

Interesting- I wonder if it's associated with any of this stuff:

https://code.google.com/p/gerrit/issues/detail?id=2025 https://git.eclipse.org/r/#/c/21094/

I'd like to be able to reproduce this issue - and I don't think you can share this code with me? - so can I just check what sequence of commands you executed?

  1. Some other BFG command - what was that? Was it --strip-blobs-bigger-than 1M?
  2. Subsequently, the next BFG command: java -jar ../../talend-svn-git-migration/bfg-1.11.2.jar --delete-files '{*.jar,*.zip,*.war,*.jpg,*.png,*.jpeg}'

The code that's dying is just a pre-run clean-up that ensures your repo has been properly re-packed before the main BFG operation starts. This only gets run if The BFG detects you've previously run The BFG on the repo (running the The BFG typically generates many loose objects, these need to be repacked).

As a workaround -and in fact as the preferred pattern of behaviour - you could/should run The BFG in a single run, starting with your original repository, combining all the switches you want to take effect in a single hit. ie:

--strip-blobs-bigger-than 1M --delete-files '{*.jar,*.zip,*.war,*.jpg,*.png,*.jpeg}'

A nice benefit of this is that it will ensure that you only get one Former-Commit-Id header in the commit message of each commit in your repo, rather than the two or more you get by running successive BFG operations.

rtyley commented 10 years ago

Were you running this second command on a fresh clone of your repo? What did you use to clone it? Standard git (please supply version number) or a JGit-based tool like IntelliJ?

cmoulton commented 9 years ago

I've just seen this issue running the sequence of commands:

  1. java -jar bfg-1.12.3.jar --strip-blobs-bigger-than 99M
  2. java -jar bfg-1.12.3.jar --delete-folders win

Here's the error info:

This repo has been processed by The BFG before! Will prune repo before proceeding - to avoid unnecessary cleaning work on unused objects... Exception in thread "main" org.eclipse.jgit.api.errors.JGitInternalException: Garbage collection failed. at org.eclipse.jgit.api.GarbageCollectCommand.call(GarbageCollectCommand.java:126) at com.madgag.git.bfg.cli.Main$$anonfun$1.apply(Main.scala:49) at com.madgag.git.bfg.cli.Main$$anonfun$1.apply(Main.scala:34) at scala.Option.map(Option.scala:146) at com.madgag.git.bfg.cli.Main$.delayedEndpoint$com$madgag$git$bfg$cli$Main$1(Main.scala:33) at com.madgag.git.bfg.cli.Main$delayedInit$body.apply(Main.scala:27) at scala.Function0$class.apply$mcV$sp(Function0.scala:40) at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12) at scala.App$$anonfun$main$1.apply(App.scala:76) at scala.App$$anonfun$main$1.apply(App.scala:76) at scala.collection.immutable.List.foreach(List.scala:381) at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35) at scala.App$class.main(App.scala:76) at com.madgag.git.bfg.cli.Main$.main(Main.scala:27) at com.madgag.git.bfg.cli.Main.main(Main.scala) Caused by: org.eclipse.jgit.errors.MissingObjectException: Missing blob f66a6278185896d2fae6c4e9576450ff5d55fefa at org.eclipse.jgit.internal.storage.file.WindowCursor.open(WindowCursor.java:149) at org.eclipse.jgit.internal.storage.pack.PackWriter.writeWholeObjectDeflate(PackWriter.java:1562) at org.eclipse.jgit.internal.storage.pack.PackWriter.writeObjectImpl(PackWriter.java:1548) at org.eclipse.jgit.internal.storage.pack.PackWriter.writeObject(PackWriter.java:1491) at org.eclipse.jgit.internal.storage.pack.PackOutputStream.writeObject(PackOutputStream.java:164) at org.eclipse.jgit.internal.storage.file.WindowCursor.writeObjects(WindowCursor.java:196) at org.eclipse.jgit.internal.storage.pack.PackWriter.writeObjects(PackWriter.java:1479) at org.eclipse.jgit.internal.storage.pack.PackWriter.writeObjects(PackWriter.java:1467) at org.eclipse.jgit.internal.storage.pack.PackWriter.writePack(PackWriter.java:1036) at org.eclipse.jgit.internal.storage.file.GC.writePack(GC.java:721) at org.eclipse.jgit.internal.storage.file.GC.repack(GC.java:555) at org.eclipse.jgit.internal.storage.file.GC.gc(GC.java:166) at org.eclipse.jgit.api.GarbageCollectCommand.call(GarbageCollectCommand.java:123) ... 14 more

cchamberlain commented 9 years ago

I seem to get this same issue whenever I'm running BFG on a repo that I've already ran it on. Even after running the reflog and force pushing to origin.

I'm running on Win 8.1 with msys2.

ScotterC commented 9 years ago

Have this issue on a mac

bfg --strip-biggest-blobs 10
git reflog expire --expire=now --all && git gc --prune=now --aggressive
bfg --strip-biggest-blobs 10

yielded:

This repo has been processed by The BFG before! Will prune repo before proceeding - to avoid unnecessary cleaning work on unused objects...
Exception in thread "main" org.eclipse.jgit.api.errors.JGitInternalException: Garbage collection failed.

Is there a way to trick the repo into thinking BFG hasn't been run yet?

cofyc commented 9 years ago

Got same issue with @cchamberlain.

rtyley commented 9 years ago

@Cofyc & @ScotterC , could you try the latest version of the BFG, v1.12.4? Downloadable here:

http://repo1.maven.org/maven2/com/madgag/bfg/1.12.4/bfg-1.12.4.jar

This version is merely an update to the latest version of the underlying JGit library (which is responsible for the part of the code with is crashing, in the prune operation) - it would be good to know if that fixes the problem.

Please note that this latest version of the BFG now requires Java 7, rather than Java 6 (as a result of using the latest version of the JGit library) - if you're using Java 6 with this new version of the BFG, you'll get an java.lang.UnsupportedClassVersionError.

See https://support.apple.com/en-gb/HT204036 for current Java installation instructions.

rtyley commented 9 years ago

Incidentally, I have tried - unsuccessfully - to reproduce this issue myself. For instance this sequence of commands (and the prune in second run of the bfg) succeeds on my Mac:

git clone --mirror git@github.com:rails/rails.git
cd rails.git/
bfg --strip-biggest-blobs 10
git reflog expire --expire=now --all && git gc --prune=now --aggressive
bfg --strip-biggest-blobs 10

It would be very helpful if someone could share a repo with me to help reproduce this issue.

ScotterC commented 9 years ago

@rtyley Seems to have fixed it for me. Thanks!

whiddershins commented 9 years ago

@rtyley I definitely have a repo with that problem, running all the latest version of everything. But its is a 10GB repo. How would you suggest I share it?

smihulet commented 9 years ago

Can we use an online drive or cloud account? On Aug 31, 2015 7:48 PM, "whiddershins" notifications@github.com wrote:

I definitely have a repo with that problem, running all the latest version of everything. But its is a 10GB repo. How would you suggest I share it?

— Reply to this email directly or view it on GitHub https://github.com/rtyley/bfg-repo-cleaner/issues/38#issuecomment-136558351 .

whiddershins commented 9 years ago

I have it in dropbox? Also I can put a .zip in AWS. I think some head or ref was corrupted because subsequent clones don't have the same issue. Which implicates jgit. But what about a flag that skips that step?

rtyley commented 9 years ago

@rtyley I definitely have a repo with that problem, running all the latest version of everything. But its is a 10GB repo. How would you suggest I share it?

Zip the repo up into a single file, and then share it with an online drive, eg Google Drive or Dropbox.

whiddershins commented 9 years ago

I corrected the problem by deleting and starting over. I think it was a result of making a commit partway through the process. This repo was so big, i also had to run the --delete-files command repeatedly with identical match terms, because it wouldn't get all the matches the first time. Sometimes it took 3 passes.

rBatt commented 9 years ago

I had a very similar error just crop up. I was trying to convert to Git LFS.

I ran

java -jar ~/bfg-1.12.5.jar --convert-to-git-lfs '*.{RData,csv,nc,DAT,txt}' --no-blob-protection
git reflog expire --expire=now --all && git gc --prune=now --aggressive

That worked fine. Then I realized that I wanted to do the same for more file types, so then I ran:

java -jar ~/bfg-1.12.5.jar --convert-to-git-lfs '*.{zip,fwf}' --no-blob-protection

And got this:

This repo has been processed by The BFG before! Will prune repo before proceeding - to avoid unnecessary cleaning work on unused objects...
Exception in thread "main" org.eclipse.jgit.api.errors.JGitInternalException: Garbage collection failed.
    at org.eclipse.jgit.api.GarbageCollectCommand.call(GarbageCollectCommand.java:192)
    at com.madgag.git.bfg.cli.Main$$anonfun$1.apply(Main.scala:49)
    at com.madgag.git.bfg.cli.Main$$anonfun$1.apply(Main.scala:34)
    at scala.Option.map(Option.scala:146)
    at com.madgag.git.bfg.cli.Main$.delayedEndpoint$com$madgag$git$bfg$cli$Main$1(Main.scala:33)
    at com.madgag.git.bfg.cli.Main$delayedInit$body.apply(Main.scala:27)
    at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
    at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
    at scala.App$$anonfun$main$1.apply(App.scala:76)
    at scala.App$$anonfun$main$1.apply(App.scala:76)
    at scala.collection.immutable.List.foreach(List.scala:381)
    at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
    at scala.App$class.main(App.scala:76)
    at com.madgag.git.bfg.cli.Main$.main(Main.scala:27)
    at com.madgag.git.bfg.cli.Main.main(Main.scala)
Caused by: org.eclipse.jgit.errors.MissingObjectException: Missing unknown 36e526e78c085009eb9bf5479d7c27e2a5ac0238
    at org.eclipse.jgit.internal.storage.file.WindowCursor.open(WindowCursor.java:145)
    at org.eclipse.jgit.lib.ObjectReader$1.open(ObjectReader.java:299)
    at org.eclipse.jgit.revwalk.RevWalk$2.next(RevWalk.java:971)
    at org.eclipse.jgit.internal.storage.pack.PackWriter.findObjectsToPack(PackWriter.java:1625)
    at org.eclipse.jgit.internal.storage.pack.PackWriter.preparePack(PackWriter.java:719)
    at org.eclipse.jgit.internal.storage.pack.PackWriter.preparePack(PackWriter.java:682)
    at org.eclipse.jgit.internal.storage.file.GC.writePack(GC.java:713)
    at org.eclipse.jgit.internal.storage.file.GC.repack(GC.java:568)
    at org.eclipse.jgit.internal.storage.file.GC.gc(GC.java:169)
    at org.eclipse.jgit.api.GarbageCollectCommand.call(GarbageCollectCommand.java:175)
    ... 14 more
rtyley commented 9 years ago

@rBatt, are you able to share a copy of the original unmodified repo with me?

rBatt commented 9 years ago

@rtyley Thanks for the response -- unfortunately, I gave up and reverted to a backup I'd made. If I run into this again, I won't ditch the repo.

rtyley commented 9 years ago

Thanks for the response -- unfortunately, I gave up and reverted to a backup I'd made. If I run into this again, I won't ditch the repo.

Out of interest, starting from your backup, would you be able to repeat your steps with the two BFG runs you detailed above and see if the issue reproduces? Also, what operating system were you running this on by the way?

rtyley commented 9 years ago

As several users have encountered this bug - which unfortunately seems to occur in the underlying JGit library when doing Git GC, rather than the BFG itself - I've opened a bug with the JGit project here:

https://bugs.eclipse.org/bugs/show_bug.cgi?id=479697

Without a sample repo to reproduce this issue againt, this will probably be quite tricky to fix, but they're clever so maybe one of them will know something. In the meantime, issue #115 is looking at ways to avoid doing a Git GC at all.

eggonabull commented 8 years ago

I encountered an issue with a very similar stack trace: a JGitInternalException caused by Missing unknown.

I did an expire and aggresive prune, then I ran and saw

$ grep -r THEMISSINGCOMMITHASH .git
.git/ORIG_HEAD:THEMISSINGCOMMITHASH 

For me, the issue was fixed by updating my .git/ORIG_HEAD to point to my current commit. (ORIG_HEAD is used by git to save the previous location of the head to be able to recover from risky commands). I'd surmise that since you're deleting refs, the refs referred to by ORIG_HEAD is getting deleted, and JGit is falling on its face.

Edit: Or more likely it's git's expire and prune that's failing to update orig_head, but JGit assumes it will point to something sane.

stromblom commented 8 years ago

Seems like what @eggonabull says might be the problem. I found that not only did my ORIG_HEAD point to a faulty commit my FETCH_HEAD also did. Using grep for finding the file referencing the commit seems like the best solution for now.

Edit: I'm using bfg 1.12.8

SamBallantyne commented 8 years ago

Issue is occurring for me in 1.12.12

gac55 commented 8 years ago

I would also like to echo @eggonabull. My ORIG_HEAD pointed to the commit causing the issue. I used grep to find the file and then nano to rewrite with the correct hash. I was using: 1.12.12

javabrett commented 8 years ago

JGit bug is now marked as fixed for future 4.4. http://git.eclipse.org/c/jgit/jgit.git/commit/?id=6590c0a92ac987489dfa49281a20e5ea956e043d

simonecorsi commented 8 years ago

just wanted to say that @eggonabull solution fixed it for me, thanks I was smashing my head on it on the last couple of days

javabrett commented 8 years ago

This issue should now be closed as it is fixed by the JGit 4.4 upgrade in BFG 1.12.13.

diurnalist commented 7 years ago

@javabrett unfortunately I have encountered this error in BFG 1.12.14. Repro steps were the same - first, I had a repo where I ran bfg --convert-to-git-lfs '*.cs', then i ran bfg --delete-files '*.cs.bak', performing the prune and gc steps in between as recommended.

However, the solution to "fix" the git file system does not work for me, as the missing commit doesn't seem to be referenced anywhere in git. That is, git -r <sha> .git doesn't find any results.

`bfg` error log ```shell This repo has been processed by The BFG before! Will prune repo before proceeding - to avoid unnecessary cleaning work on unused objects... Exception in thread "main" org.eclipse.jgit.api.errors.JGitInternalException: Garbage collection failed. at org.eclipse.jgit.api.GarbageCollectCommand.call(GarbageCollectCommand.java:192) at com.madgag.git.bfg.cli.Main$$anonfun$1.apply(Main.scala:49) at com.madgag.git.bfg.cli.Main$$anonfun$1.apply(Main.scala:34) at scala.Option.map(Option.scala:146) at com.madgag.git.bfg.cli.Main$.delayedEndpoint$com$madgag$git$bfg$cli$Main$1(Main.scala:33) at com.madgag.git.bfg.cli.Main$delayedInit$body.apply(Main.scala:27) at scala.Function0$class.apply$mcV$sp(Function0.scala:34) at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12) at scala.App$$anonfun$main$1.apply(App.scala:76) at scala.App$$anonfun$main$1.apply(App.scala:76) at scala.collection.immutable.List.foreach(List.scala:381) at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35) at scala.App$class.main(App.scala:76) at com.madgag.git.bfg.cli.Main$.main(Main.scala:27) at com.madgag.git.bfg.cli.Main.main(Main.scala) Caused by: org.eclipse.jgit.errors.MissingObjectException: Missing commit 072c1e0fe9e4b930af294a4eab71ad967e8281e3 at org.eclipse.jgit.internal.storage.file.WindowCursor.open(WindowCursor.java:159) at org.eclipse.jgit.revwalk.RevWalk.getCachedBytes(RevWalk.java:903) at org.eclipse.jgit.revwalk.RevCommit.parseHeaders(RevCommit.java:155) at org.eclipse.jgit.revwalk.PendingGenerator.next(PendingGenerator.java:147) at org.eclipse.jgit.revwalk.StartGenerator.next(StartGenerator.java:184) at org.eclipse.jgit.revwalk.RevWalk.next(RevWalk.java:435) at org.eclipse.jgit.revwalk.ObjectWalk.next(ObjectWalk.java:293) at org.eclipse.jgit.internal.storage.pack.PackWriterBitmapWalker.findObjects(PackWriterBitmapWalker.java:120) at org.eclipse.jgit.internal.storage.pack.PackWriter.prepareBitmapIndex(PackWriter.java:2026) at org.eclipse.jgit.internal.storage.file.GC.writePack(GC.java:810) at org.eclipse.jgit.internal.storage.file.GC.repack(GC.java:595) at org.eclipse.jgit.internal.storage.file.GC.gc(GC.java:176) at org.eclipse.jgit.api.GarbageCollectCommand.call(GarbageCollectCommand.java:175) ... 14 more ```

Unfortunately I can't provide a test repo.

javabrett commented 7 years ago

@diurnalist are you able to discern anything special about commit 072c1e0fe9e4b930af294a4eab71ad967e8281e3. Worth grepping through your .git repo, see if it shows up anywhere interesting.

diurnalist commented 7 years ago

@javabrett that's the odd part, I did a grep -r 072c1e0fe9e4b930af294a4eab71ad967e8281e3 .git and it did not turn up with anything.

I was able to get around my problem by starting over and running both of the bfg operations simultaneously, though.

javabrett commented 7 years ago

I hope that the original issue that caused "Jgit MissingObjectException" was resolved with the JGit upgrade, and am quite confident it was. So I think this is becoming a metabug for different "Jgit MissingObjectException" occurrences. I accept that they look similar.

The currently most-likely candidate JGit issue is https://bugs.eclipse.org/bugs/show_bug.cgi?id=496512 . It is promising because it seems like this could be related to JGit meeting an IOException that it can't understand, which might cause it to silently mark a Git pack bad, which could cause this problem.

What do you get for git show 072c1e0fe9e4b930af294a4eab71ad967e8281e3, and is that commit pre-BFG-rewrite (original history) or post-rewrite (new, rewritten history)?

The JGit issue suggests "increase the log level for ObjectDirectory" to learn what the IOException is, so if we can figure out how to get that into BFG config or build, that might help. Can you check if there are any unusual IO conditions with your pack files - especially tight disk space.