Closed janhohenheim closed 1 year ago
Does it rewrite history?
Also, https://github.com/newren/git-filter-repo/ seems like the more official replacement for git filter-branch
(the official docs point to it). Any reason to use BFG Repo Cleaner over it?
And is this something I'll need to do periodically? Because if it does rewrite history I don't want to do it once this plugin gains users...
It does rewrite history by pruning changes to big files. Some examples:
foo.glb
that is no longer in the current main
branch, it is removed entirely from the history.bar.pdf
that is still in the current main
branch, its history is removed as if it was just added to the repo Since code files will not realistically come to any size worth pruning, their history will stay untouched.
I've never used git-filter-repo
. They do have a comparison page with BFG which mentions a bfg-ish mode, but I cannot comment on that.
A BFG clean is preferably only done once, because the best practice is to not change big files added to git often because of the delta problem. So, if you keep your changes to assets to a minimum, you'll be fine.
If you do need to change them often, you should instead add them to git LFS.
Okay, but I rearely touch the big assets. I do have some GLB files that I use for the
I tried it locally with --strip-blobs-bigger-than 1M
. The homepage says 100M, but my repository is only 85M so I figured it won't do much. It did reduce it to 36M, but:
Updating 1 Ref
--------------
Ref Before After
-----------------------------------------
refs/heads/gh-pages | a3cc02bc | 1867c842
Updating references: 100% (1/1)
...Ref update completed in 16 ms.
Commit Tree-Dirt History
------------------------
Earliest Latest
| |
...........................................................D
D = dirty commits (file tree fixed)
m = modified commits (commit message or parents changed)
. = clean commits (no changes to file tree)
Before After
-------------------------------------------
First modified commit | a3cc02bc | 1867c842
Last dirty commit | a3cc02bc | 1867c842
Deleted files
-------------
Filename Git id
-------------------------------------------
definitions.rs.html | c54a0151 (7.6 MB)
extensions.rs.html | b2d97dee (2.5 MB)
gl46.rs.html | 834f8463 (3.7 MB)
index.html | df92c814 (1.1 MB)
platformer_2d_bg.wasm | 911c8f93 (24.4 MB)
platformer_3d_bg.wasm | dd2e3b16 (25.0 MB)
property_bool.rs.html | 902bc0e5 (1.4 MB)
search-index.js | 176e1f77 (12.9 MB)
struct.AutoSimd.html | 75889212 (4.6 MB)
struct.Complex.html | 77ea2a1b (1.0 MB)
struct.Matrix.html | a5d7dee2 (2.7 MB)
struct.Unit.html | 18c326ab (1.1 MB)
trait.Clone.js | 3840206a (1.4 MB)
trait.Debug.js | cceee633 (1.3 MB)
trait.Freeze.js | 0ab99e4a (1.5 MB)
...
It only cleaned the gh-pages! This does make sense - the GitHub pages contain built WASM and HTML files, so it can get quite big. But I don't want to delete the gh-pages! Both latest-docs and online examples depend on it!
BTW - even without running bfg
, just doing git gc --prune=now --aggressive
cuts the size to 48M.
I see! BFG does not delete the entire GitHub pages branch, but only its history. I suppose that should be fine, shouldn't it?
This also means that git clone <URL> --single-branch
should be massively faster, so that's good to know.
If we do this, it'll have to be in the CI and update every time gh-pages
is updated. I think we'll have to see if it can be done with git filter-repo
, because BFG seems to work on the entire repository and I won't feel comfortable unless I can limit this to just gh-pages
.
Maybe it's already enough if you point out in a ## Contributing
section in the readme that people should git clone <URL> --single-branch
?
Isn't this an overreaction? 85MB is not that much, and this is a problem that all repositories that use a gh-pages
branch should have. I'll try to find an alternative, or maybe find see if there is a way to use a single gh-pages repositories for all my repositories, but I see no reason to to add such a warning.
Very possible. I just opened the issue because I was cloning both your repo and wanderlust while also compiling something in the background. Wanderlust took something like 5 seconds to clone, while tnua took about 2 minutes to resolve the deltas. It could very well be that Windows was deprioritizing the process a lot in that moment and that I would otherwise not have noticed anything :)
Update: I checked again with no processes running in the background and stuff is way better now. Consider this a user error 😉
Removing gh-pages
reduces the size to 1.7M. And when I actually check out the gh-pages
trunk it becomes 1.2G - so I think 85M is a pretty good compression. But yes - I need to find an alternative...
Maybe I can use this: https://github.com/actions/deploy-pages
Glad you could fix it 🎉
Without the gh-pages
branch:
[idanarye@idanarye tmp]$ time git clone git@github.com:idanarye/bevy-tnua.git
Cloning into 'bevy-tnua'...
remote: Enumerating objects: 666, done.
remote: Counting objects: 100% (5/5), done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 666 (delta 0), reused 3 (delta 0), pack-reused 661
Receiving objects: 100% (666/666), 598.50 KiB | 1.22 MiB/s, done.
Resolving deltas: 100% (427/427), done.
real 0m2.857s
user 0m0.051s
sys 0m0.015s
[idanarye@idanarye tmp]$ du -hd0 bevy-tnua
1.7M bevy-tnua
For anyone from the future who stumble here while googling this problem - in addition to the changes in the action (which you can see in the commit), you also need to go to Settings -> Pages and change the Source to GitHub Actions. Do that after the action finishes, because this will invalidate your old branch-based page (as I've just learned the hard way)
Cloning the repo currently takes quite a while due to the huge amount of deltas. Would you mind running the BFG Repo Cleaner once?