rtyley / bfg-repo-cleaner

Removes large or troublesome blobs like git-filter-branch does, but faster. And written in Scala
https://rtyley.github.io/bfg-repo-cleaner/
GNU General Public License v3.0
10.83k stars 535 forks source link

As long as the project has any PRs or TAGs that exist on github, no action will work. #516

Open ykla opened 1 month ago

ykla commented 1 month ago

Could the author please consider pointing out in the documentation that this script, including the git gc operation itself, is not valid for projects on github (if the project includes any PRs or TAGs).

If only PRs are closed not merged, I haven't tested that. But I have both PRs that were closed and PRs that were merged and all PRs were submitted by me. I also own the entire project.

In fact it's exactly what the title says, as long as your project exists on github and has any pr or tag, you can't perform any gc operations nor are you allowed to perform any -mirror related operations. I think this may be a design issue with github.

I've tried contacting github's customer service, and both their bots and human customer service claim that it can only delete PRs that have sensitive data, refusing any other reason, and asking you to point out what and where your sensitive data is.

And even without that script, simple git gc operations don't work, they're just fake pushes like in the article. Clone again and you'll see that it doesn't do anything.

I've tried almost any of the methods mentioned in the other issues and none of them work at the moment.

Including but not limited to https://some-natalie.dev/blog/omit-PRs-clean-BFG

Removing big files from repos using BFG will just cause the push to falsely succeed without actually doing any push. deleting the local archive and clone will reveal this.

I've used the --no-blob-protection parameter once in the article, but I've omitted it for the sake of the chapter, so please forgive me.

C:\Users\ykla\Desktop\SEP-CN>git clone --mirror  https://github.com/taophilosophy/SEP-CN.git
Cloning into bare repository 'SEP-CN.git'...
remote: Enumerating objects: 36652, done.
remote: Counting objects: 100% (36651/36651), done.
remote: Compressing objects: 100% (8545/8545), done.
remote: Total 36652 (delta 27378), reused 36555 (delta 27299), pack-reused 1
Receiving objects: 100% (36652/36652), 84.91 MiB | 15.97 MiB/s, done.
Resolving deltas: 100% (27378/27378), done.

C:\Users\ykla\Desktop\SEP-CN>java -jar C:\Users\ykla\Desktop\SEP-CN\bfg.jar --strip-blobs-bigger-than 2M SEP-CN.git

Using repo : C:\Users\ykla\Desktop\SEP-CN\SEP-CN.git

Scanning packfile for large blobs: 36652
Scanning packfile for large blobs completed in 208 ms.
Found 4 blob ids for large blobs - biggest=10114104 smallest=9900680
Total size (unpacked)=39948080
Found 2635 objects to protect
Found 23 commit-pointing refs : HEAD, refs/heads/main, refs/pull/1/head, ...

Protected commits
-----------------

These are your protected commits, and so their contents will NOT be altered:

 * commit 167e5a0d (protected by 'HEAD')

Cleaning
--------

Found 3070 commits
Cleaning commits:       100% (3070/3070)
Cleaning commits completed in 451 ms.

Updating 6 Refs
---------------

        Ref                Before     After
        --------------------------------------
        refs/pull/1/head | 893e91f0 | cb5430e5
        refs/pull/2/head | e6d30510 | 640191ab
        refs/pull/3/head | bb15af62 | 51ee9aab
        refs/pull/4/head | 3cf332d1 | f3f0f301
        refs/pull/5/head | 5afaa66c | 1fd7ed94
        refs/pull/7/head | 0f682ae9 | 90b45a77

Updating references:    100% (6/6)
...Ref update completed in 25 ms.

Commit Tree-Dirt History
------------------------

        Earliest                                              Latest
        |                                                          |
        ..............................................DD....D.......

        D = dirty commits (file tree fixed)
        m = modified commits (commit message or parents changed)
        . = clean commits (no changes to file tree)

                                Before     After
        -------------------------------------------
        First modified commit | 893e91f0 | cb5430e5
        Last dirty commit     | 0f682ae9 | 90b45a77

Deleted files
-------------

        Filename                               Git id
        ---------------------------------------------------------------------------
        autocorrect-node.linux-x64-gnu.node  | 61db0d39 (9.6 MB), 4f04acea (9.4 MB)
        autocorrect-node.linux-x64-musl.node | c5bf586b (9.4 MB), aef00e88 (9.6 MB)

In total, 18 object ids were changed. Full details are logged here:

        C:\Users\ykla\Desktop\SEP-CN\SEP-CN.git.bfg-report\2024-05-25\11-06-02

BFG run is complete! When ready, run: git reflog expire --expire=now --all && git gc --prune=now --aggressive

C:\Users\ykla\Desktop\SEP-CN>cd SEP-CN.git

C:\Users\ykla\Desktop\SEP-CN\SEP-CN.git> git reflog expire --expire=now --all && git gc --prune=now --aggressive
Enumerating objects: 36652, done.
Counting objects: 100% (36652/36652), done.
Delta compression using up to 16 threads
Compressing objects: 100% (35840/35840), done.
Writing objects: 100% (36652/36652), done.
Selecting bitmap commits: 3055, done.
Building bitmaps: 100% (135/135), done.
Total 36652 (delta 27378), reused 9254 (delta 0), pack-reused 0 (from 0)

C:\Users\ykla\Desktop\SEP-CN\SEP-CN.git>git push --force
Everything up-to-date

C:\Users\ykla\Desktop\SEP-CN\SEP-CN.git>cd ..

C:\Users\ykla\Desktop\SEP-CN>git clone --mirror  https://github.com/taophilosophy/SEP-CN.git
Cloning into bare repository 'SEP-CN.git'...
remote: Enumerating objects: 36652, done.
remote: Counting objects: 100% (36651/36651), done.
remote: Compressing objects: 100% (8545/8545), done.
remote: Total 36652 (delta 27378), reused 36555 (delta 27299), pack-reused 1
Receiving objects: 100% (36652/36652), 84.91 MiB | 14.14 MiB/s, done.
Resolving deltas: 100% (27378/27378), done.

C:\Users\ykla\Desktop\SEP-CN>java -jar C:\Users\ykla\Desktop\SEP-CN\bfg.jar --strip-blobs-bigger-than 2M SEP-CN.git

Using repo : C:\Users\ykla\Desktop\SEP-CN\SEP-CN.git

Scanning packfile for large blobs: 36652
Scanning packfile for large blobs completed in 214 ms.
Found 4 blob ids for large blobs - biggest=10114104 smallest=9900680
Total size (unpacked)=39948080
Found 2635 objects to protect
Found 23 commit-pointing refs : HEAD, refs/heads/main, refs/pull/1/head, ...

Protected commits
-----------------

These are your protected commits, and so their contents will NOT be altered:

 * commit 167e5a0d (protected by 'HEAD')

Cleaning
--------

Found 3070 commits
Cleaning commits:       100% (3070/3070)
Cleaning commits completed in 448 ms.

Updating 6 Refs
---------------

        Ref                Before     After
        --------------------------------------
        refs/pull/1/head | 893e91f0 | cb5430e5
        refs/pull/2/head | e6d30510 | 640191ab
        refs/pull/3/head | bb15af62 | 51ee9aab
        refs/pull/4/head | 3cf332d1 | f3f0f301
        refs/pull/5/head | 5afaa66c | 1fd7ed94
        refs/pull/7/head | 0f682ae9 | 90b45a77

Updating references:    100% (6/6)
...Ref update completed in 25 ms.

Commit Tree-Dirt History
------------------------

        Earliest                                              Latest
        |                                                          |
        ..............................................DD....D.......

        D = dirty commits (file tree fixed)
        m = modified commits (commit message or parents changed)
        . = clean commits (no changes to file tree)

                                Before     After
        -------------------------------------------
        First modified commit | 893e91f0 | cb5430e5
        Last dirty commit     | 0f682ae9 | 90b45a77

Deleted files
-------------

        Filename                               Git id
        ---------------------------------------------------------------------------
        autocorrect-node.linux-x64-gnu.node  | 61db0d39 (9.6 MB), 4f04acea (9.4 MB)
        autocorrect-node.linux-x64-musl.node | c5bf586b (9.4 MB), aef00e88 (9.6 MB)

In total, 18 object ids were changed. Full details are logged here:

        C:\Users\ykla\Desktop\SEP-CN\SEP-CN.git.bfg-report\2024-05-25\11-09-09

BFG run is complete! When ready, run: git reflog expire --expire=now --all && git gc --prune=now --aggressive

C:\Users\ykla\Desktop\SEP-CN\SEP-CN.git> git reflog expire --expire=now --all && git gc --prune=now --aggressive
Enumerating objects: 36652, done.
Counting objects: 100% (36652/36652), done.
Delta compression using up to 16 threads
Compressing objects: 100% (35840/35840), done.
Writing objects: 100% (36652/36652), done.
Selecting bitmap commits: 3055, done.
Building bitmaps: 100% (135/135), done.
Total 36652 (delta 27378), reused 9270 (delta 0), pack-reused 0 (from 0)

C:\Users\ykla\Desktop\SEP-CN\SEP-CN.git>git push --force
Everything up-to-date

In https://github.com/rtyley/bfg-repo-cleaner/issues/36

I think that I accidentally found a solution (I'm not sure I can explain why it works). Same thing, I had PRs (closed) on GitHub and git push was being rejected.

  1. I cloned the local copy of my repo (using git clone --mirror but providing the path to my local directory).
  2. I cleaned this copy using BFG.
  3. git remote was showing origin as pointing to my local repo, so I removed origin and added it but this time pointing to my repo hosted on GitHub.
  4. From the clean mirror copy, I forced a push using git push -f --set-upstream origin master.

As far as I can tell, my repo in GitHub is now clean. Git pulling from my original repo doesn't work, but now I can easily delete that folder and clone a new copy.

Any ideas for/against doing it this way?

If you have any merged branches, it just looks like the push worked, which is the same as using git clone --bare. But if you push and then clone back, you'll see that nothing has changed.

C:\Users\ykla\Desktop\SEP-CN\test>git clone --mirror  https://github.com/taophilosophy/SEP-CN.git C:\Users\ykla\Desktop\SEP-CN/test
C:\Users\ykla\Desktop\SEP-CN\test>java -jar C:\Users\ykla\Desktop\SEP-CN\bfg.jar --strip-blobs-bigger-than 2M test

Using repo : C:\Users\ykla\Desktop\SEP-CN\test

Scanning packfile for large blobs: 36652
Scanning packfile for large blobs completed in 225 ms.
Found 4 blob ids for large blobs - biggest=10114104 smallest=9900680
Total size (unpacked)=39948080
Found 2635 objects to protect
Found 23 commit-pointing refs : HEAD, refs/heads/main, refs/pull/1/head, ...

Protected commits
-----------------

These are your protected commits, and so their contents will NOT be altered:

 * commit 167e5a0d (protected by 'HEAD')

Cleaning
--------

Found 3070 commits
Cleaning commits:       100% (3070/3070)
Cleaning commits completed in 445 ms.

Updating 6 Refs
---------------

        Ref                Before     After
        --------------------------------------
        refs/pull/1/head | 893e91f0 | cb5430e5
        refs/pull/2/head | e6d30510 | 640191ab
        refs/pull/3/head | bb15af62 | 51ee9aab
        refs/pull/4/head | 3cf332d1 | f3f0f301
        refs/pull/5/head | 5afaa66c | 1fd7ed94
        refs/pull/7/head | 0f682ae9 | 90b45a77

Updating references:    100% (6/6)
...Ref update completed in 26 ms.

Commit Tree-Dirt History
------------------------

        Earliest                                              Latest
        |                                                          |
        ..............................................DD....D.......

        D = dirty commits (file tree fixed)
        m = modified commits (commit message or parents changed)
        . = clean commits (no changes to file tree)

                                Before     After
        -------------------------------------------
        First modified commit | 893e91f0 | cb5430e5
        Last dirty commit     | 0f682ae9 | 90b45a77

Deleted files
-------------

        Filename                               Git id
        ---------------------------------------------------------------------------
        autocorrect-node.linux-x64-gnu.node  | 61db0d39 (9.6 MB), 4f04acea (9.4 MB)
        autocorrect-node.linux-x64-musl.node | c5bf586b (9.4 MB), aef00e88 (9.6 MB)

In total, 18 object ids were changed. Full details are logged here:

        C:\Users\ykla\Desktop\SEP-CN\test.bfg-report\2024-05-24\22-40-42

BFG run is complete! When ready, run: git reflog expire --expire=now --all && git gc --prune=now --aggressive
C:\Users\ykla\Desktop\SEP-CN\test>git reflog expire --expire=now --all && git gc --prune=now --aggressive
C:\Users\ykla\Desktop\SEP-CN\test>git remote
origin
C:\Users\ykla\Desktop\SEP-CN\test>git remote remove origin
Note: A branch outside the refs/remotes/ hierarchy was not removed;
to delete it, use:
  git branch -d main
C:\Users\ykla\Desktop\SEP-CN\test>git remote add origin https://github.com/taophilosophy/SEP-CN.git

C:\Users\ykla\Desktop\SEP-CN\test>git push -f --set-upstream origin main
branch 'main' set up to track 'origin/main'.
Everything up-to-date

But if you delete the directory test and clone it again, you'll see that everything is back to the way it was. This only changes locally.

It doesn't look like there's any difference between this command and git clone --bare, but neither works.

C:\Users\ykla\Desktop\SEP-CN>java -jar C:\Users\ykla\Desktop\SEP-CN\bfg.jar --strip-blobs-bigger-than 2M test

Using repo : C:\Users\ykla\Desktop\SEP-CN\test

Scanning packfile for large blobs: 36652
Scanning packfile for large blobs completed in 222 ms.
Found 4 blob ids for large blobs - biggest=10114104 smallest=9900680
Total size (unpacked)=39948080
Found 2635 objects to protect
Found 23 commit-pointing refs : HEAD, refs/heads/main, refs/pull/1/head, ...

Protected commits
-----------------

These are your protected commits, and so their contents will NOT be altered:

 * commit 167e5a0d (protected by 'HEAD')

Cleaning
--------

Found 3070 commits
Cleaning commits:       100% (3070/3070)
Cleaning commits completed in 460 ms.

Updating 6 Refs
---------------

        Ref                Before     After
        --------------------------------------
        refs/pull/1/head | 893e91f0 | cb5430e5
        refs/pull/2/head | e6d30510 | 640191ab
        refs/pull/3/head | bb15af62 | 51ee9aab
        refs/pull/4/head | 3cf332d1 | f3f0f301
        refs/pull/5/head | 5afaa66c | 1fd7ed94
        refs/pull/7/head | 0f682ae9 | 90b45a77

Updating references:    100% (6/6)
...Ref update completed in 25 ms.

Commit Tree-Dirt History
------------------------

        Earliest                                              Latest
        |                                                          |
        ..............................................DD....D.......

        D = dirty commits (file tree fixed)
        m = modified commits (commit message or parents changed)
        . = clean commits (no changes to file tree)

                                Before     After
        -------------------------------------------
        First modified commit | 893e91f0 | cb5430e5
        Last dirty commit     | 0f682ae9 | 90b45a77

Deleted files
-------------

        Filename                               Git id
        ---------------------------------------------------------------------------
        autocorrect-node.linux-x64-gnu.node  | 61db0d39 (9.6 MB), 4f04acea (9.4 MB)
        autocorrect-node.linux-x64-musl.node | c5bf586b (9.4 MB), aef00e88 (9.6 MB)

In total, 18 object ids were changed. Full details are logged here:

        C:\Users\ykla\Desktop\SEP-CN\test.bfg-report\2024-05-24\23-06-23

BFG run is complete! When ready, run: git reflog expire --expire=now --all && git gc --prune=now --aggressive