quisquous / cactbot

FFXIV TypeScript Raiding Overlay
Apache License 2.0
794 stars 378 forks source link

gh-pages branch is too big #4865

Closed trim21 closed 2 years ago

trim21 commented 2 years ago

I have a 1 year ago cactbot repo (head 16700eb4 raidemulator: Convert RaidEmulatorAnalysisTimelineUI to typescript (# 3106) - 1 year, 3 months ago)

And today I pull it I found I need to pull 1.67GB files

remote: Enumerating objects: 22376, done.
remote: Counting objects: 100% (3747/3747), done.
remote: Compressing objects: 100% (197/197), done.
remote: Total 22376 (delta 3596), reused 3679 (delta 3542), pack-reused 18629
Receiving objects: 100% (22376/22376), 1.67 GiB | 6.29 MiB/s, done.
Resolving deltas: 100% (14320/14320), completed with 855 local objects.

Then I found currently github api says cactbot git repo's size is 1.74 GB (less than 100 MB 1 year ago).

also, it take actions/checkout 2min to clone the repo in github actions...

image

MaikoTan commented 2 years ago

maybe git maintenance command is useful here.

And yeah, I think it is fair to not store git histoire of gh-pages.

trim21 commented 2 years ago

maybe git maintenance command is useful here.

need a github admin to run it on github server, not locally.

And I think remove gh-pages' history will make most size of this 1.74GB not fetched with contributor running git clone

trim21 commented 2 years ago

size of main branch is 63.52 MiB

# git clone https://github.com/quisquous/cactbot.git --branch main --depth=10000000 cactbot-test
Cloning into 'cactbot-test'...
remote: Enumerating objects: 61899, done.
remote: Counting objects: 100% (163/163), done.
remote: Compressing objects: 100% (135/135), done.
remote: Total 61899 (delta 47), reused 132 (delta 22), pack-reused 61736
Receiving objects: 100% (61899/61899), 63.52 MiB | 9.00 MiB/s, done.
Resolving deltas: 100% (47011/47011), done.

size of gh-pages branch is 1.69 GiB

# git clone https://github.com/quisquous/cactbot.git --branch gh-pages --depth=10000000 cactbot-test
Cloning into 'cactbot-test'...
remote: Enumerating objects: 55939, done.
remote: Counting objects: 100% (204/204), done.
remote: Compressing objects: 100% (134/134), done.
remote: Total 55939 (delta 90), reused 122 (delta 68), pack-reused 55735
Receiving objects: 100% (55939/55939), 1.69 GiB | 1.58 MiB/s, done.
Resolving deltas: 100% (40135/40135), done.
trim21 commented 2 years ago

currently gh-pages deploy flow has a write amplification problem.

consideing commit bf023b55fdb27a95e7fd50e61d16df01440cfc41, it's caused by a change in one 80kb trigger file in pr 2975, and result in a 43x bigger commit in gh-pages, because whole raidboss module need to be bundled again and update a 3.47MB raidboss.bundle.js file. (and minified in one line, make git can't pack them in delta to save space)

And in the future, with bigger bundle size, it will get worse.

The solution is to remove gh-pages branch's commit history or move to a seprated repo, and wait github run scheduled git gc to remove them.

(just find that minify is disabled now,so it won't be that worse like I said, git may be able to compress objects into pack with diff to reduce size.)

quisquous commented 2 years ago

currently gh-pages deploy flow has a write amplification problem

Should the gh-pages publishing force-push to that branch (or reset --hard to main, and then push a commit on top)? That way we don't accumulate history? I don't think the history of the pages branch is important here, and it's really just there as a place to host the current built state.

(also cc @panicstevenson for general workflow FYI)

trim21 commented 2 years ago

you can just use https://github.com/peaceiris/actions-gh-pages 's force_orphan option , history of gh-pages branch will be removed and only latest commit is keeps in gh-pages branch.

it will look like this: https://github.com/trim21/webpack-userscript-template/commits/gh-pages

trim21 commented 2 years ago

looks fine now

git clone https://github.com/quisquous/cactbot
Cloning into 'cactbot'...
remote: Enumerating objects: 70226, done.
remote: Counting objects: 100% (165/165), done.
remote: Compressing objects: 100% (104/104), done.
remote: Total 70226 (delta 67), reused 137 (delta 57), pack-reused 70061
Receiving objects: 100% (70226/70226), 83.16 MiB | 4.95 MiB/s, done.
Resolving deltas: 100% (52717/52717), done.
quisquous commented 2 years ago

Thanks for the great suggestion!