idanarye / bevy-tnua

A floating character controller for Bevy
https://crates.io/crates/bevy-tnua
Apache License 2.0
180 stars 12 forks source link

gh-pages branch is huge, increasing the repository size to 50 times what it should be and inflating cloning times #1

Closed janhohenheim closed 1 year ago

janhohenheim commented 1 year ago

Cloning the repo currently takes quite a while due to the huge amount of deltas. Would you mind running the BFG Repo Cleaner once?

idanarye commented 1 year ago

Does it rewrite history?

idanarye commented 1 year ago

Also, https://github.com/newren/git-filter-repo/ seems like the more official replacement for git filter-branch (the official docs point to it). Any reason to use BFG Repo Cleaner over it?

idanarye commented 1 year ago

And is this something I'll need to do periodically? Because if it does rewrite history I don't want to do it once this plugin gains users...

janhohenheim commented 1 year ago

It does rewrite history by pruning changes to big files. Some examples:

Since code files will not realistically come to any size worth pruning, their history will stay untouched.

I've never used git-filter-repo. They do have a comparison page with BFG which mentions a bfg-ish mode, but I cannot comment on that.

A BFG clean is preferably only done once, because the best practice is to not change big files added to git often because of the delta problem. So, if you keep your changes to assets to a minimum, you'll be fine.
If you do need to change them often, you should instead add them to git LFS.

idanarye commented 1 year ago

Okay, but I rearely touch the big assets. I do have some GLB files that I use for the

I tried it locally with --strip-blobs-bigger-than 1M. The homepage says 100M, but my repository is only 85M so I figured it won't do much. It did reduce it to 36M, but:

Updating 1 Ref
--------------

    Ref                   Before     After   
    -----------------------------------------
    refs/heads/gh-pages | a3cc02bc | 1867c842

Updating references:    100% (1/1)
...Ref update completed in 16 ms.

Commit Tree-Dirt History
------------------------

    Earliest                                              Latest
    |                                                          |
    ...........................................................D

    D = dirty commits (file tree fixed)
    m = modified commits (commit message or parents changed)
    . = clean commits (no changes to file tree)

                           Before     After   
    -------------------------------------------
    First modified commit | a3cc02bc | 1867c842
    Last dirty commit     | a3cc02bc | 1867c842

Deleted files
-------------

    Filename                 Git id            
    -------------------------------------------
    definitions.rs.html    | c54a0151 (7.6 MB) 
    extensions.rs.html     | b2d97dee (2.5 MB) 
    gl46.rs.html           | 834f8463 (3.7 MB) 
    index.html             | df92c814 (1.1 MB) 
    platformer_2d_bg.wasm  | 911c8f93 (24.4 MB)
    platformer_3d_bg.wasm  | dd2e3b16 (25.0 MB)
    property_bool.rs.html  | 902bc0e5 (1.4 MB) 
    search-index.js        | 176e1f77 (12.9 MB)
    struct.AutoSimd.html   | 75889212 (4.6 MB) 
    struct.Complex.html    | 77ea2a1b (1.0 MB) 
    struct.Matrix.html     | a5d7dee2 (2.7 MB) 
    struct.Unit.html       | 18c326ab (1.1 MB) 
    trait.Clone.js         | 3840206a (1.4 MB) 
    trait.Debug.js         | cceee633 (1.3 MB) 
    trait.Freeze.js        | 0ab99e4a (1.5 MB) 
    ...

It only cleaned the gh-pages! This does make sense - the GitHub pages contain built WASM and HTML files, so it can get quite big. But I don't want to delete the gh-pages! Both latest-docs and online examples depend on it!

BTW - even without running bfg, just doing git gc --prune=now --aggressive cuts the size to 48M.

janhohenheim commented 1 year ago

I see! BFG does not delete the entire GitHub pages branch, but only its history. I suppose that should be fine, shouldn't it? This also means that git clone <URL> --single-branch should be massively faster, so that's good to know.

idanarye commented 1 year ago

If we do this, it'll have to be in the CI and update every time gh-pages is updated. I think we'll have to see if it can be done with git filter-repo, because BFG seems to work on the entire repository and I won't feel comfortable unless I can limit this to just gh-pages.

janhohenheim commented 1 year ago

Maybe it's already enough if you point out in a ## Contributing section in the readme that people should git clone <URL> --single-branch?

idanarye commented 1 year ago

Isn't this an overreaction? 85MB is not that much, and this is a problem that all repositories that use a gh-pages branch should have. I'll try to find an alternative, or maybe find see if there is a way to use a single gh-pages repositories for all my repositories, but I see no reason to to add such a warning.

janhohenheim commented 1 year ago

Very possible. I just opened the issue because I was cloning both your repo and wanderlust while also compiling something in the background. Wanderlust took something like 5 seconds to clone, while tnua took about 2 minutes to resolve the deltas. It could very well be that Windows was deprioritizing the process a lot in that moment and that I would otherwise not have noticed anything :)

janhohenheim commented 1 year ago

Update: I checked again with no processes running in the background and stuff is way better now. Consider this a user error 😉

idanarye commented 1 year ago

Removing gh-pages reduces the size to 1.7M. And when I actually check out the gh-pages trunk it becomes 1.2G - so I think 85M is a pretty good compression. But yes - I need to find an alternative...

idanarye commented 1 year ago

Maybe I can use this: https://github.com/actions/deploy-pages

janhohenheim commented 1 year ago

Glad you could fix it 🎉

idanarye commented 1 year ago

Without the gh-pages branch:

[idanarye@idanarye tmp]$ time git clone git@github.com:idanarye/bevy-tnua.git
Cloning into 'bevy-tnua'...
remote: Enumerating objects: 666, done.
remote: Counting objects: 100% (5/5), done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 666 (delta 0), reused 3 (delta 0), pack-reused 661
Receiving objects: 100% (666/666), 598.50 KiB | 1.22 MiB/s, done.
Resolving deltas: 100% (427/427), done.

real    0m2.857s
user    0m0.051s
sys     0m0.015s
[idanarye@idanarye tmp]$ du -hd0 bevy-tnua
1.7M    bevy-tnua
idanarye commented 1 year ago

For anyone from the future who stumble here while googling this problem - in addition to the changes in the action (which you can see in the commit), you also need to go to Settings -> Pages and change the Source to GitHub Actions. Do that after the action finishes, because this will invalidate your old branch-based page (as I've just learned the hard way)