rtyley / bfg-repo-cleaner

Removes large or troublesome blobs like git-filter-branch does, but faster. And written in Scala
https://rtyley.github.io/bfg-repo-cleaner/
GNU General Public License v3.0
10.83k stars 535 forks source link

Multiple Runs of BFG Overriding Eachothers Changes #517

Open jeran-urban opened 6 days ago

jeran-urban commented 6 days ago

in testing, I found that if I try to do 2 different actions through 2 different runs of BFG, and they happen to both target info on a shared commit, they override each other and preserve bad data between them.

Ex: Run 1

$ git clone --mirror git@github.com:myOrg/myRepo.git
$ bfg --delete-files "{file1.sql,file2.sql}" myRepo.git
$ cd myRepo.git
$ git reflog expire --expire=now --all && git gc --prune=now --aggressive
$ git push --force

I then delete my local copies, and the bfg mirrored repo

Then I run step 2 from scratch

$ git clone --mirror git@github.com:myOrg/myRepo.git
$ bfg --delete-folders BadFolder myRepo.git
$ cd myRepo.git
$ git reflog expire --expire=now --all && git gc --prune=now --aggressive
$ git push --force

I would assume the object-id-map.old-new-txt output would be: run 1: originalHash -> HashA run 2: HashA -> HashB

but instead for those commits I see: run 1: originalHash -> HashA run 2: originalHash -> HashB

and when I navigate to the commits I can see all 3 (I know still seeing the original one without github gc on the remote is expected), what I would have expected is : originalHash = old content HashA = originalHash - 2 files HashB = originalHash - 2 files AND - the folder

what I see is: originalHash = old content HashA = originalHash - 2 files HashB = originalHash (with the 2 files still showing) AND - the folder

So if I run bfg multiple times it appears to not really clean either fully ?

My questions based on this are:

  1. is this intended behavior?
  2. is there a better way to handle this? (this is a back-to-back example, but curious on we fixed something a week ago, now we found something else to fix, and don't realize we just undid the first fix).
  3. Is this an extension of the github gc issue? (until the original reference is gone, that is what will get targeted by bfg, not the most recent version?)

Thank you for your time!