Open matschaffer opened 2 years ago
As a test I pushed the develop branch and a squashed gh-pages to https://github.com/matschaffer/scratch-gui-squashed-gh-pages
It's not tiny but the clone size seems to have been cut by about 70% (~1.9GB to ~400MB)
❯ git clone git@github.com:matschaffer/scratch-gui-squashed-gh-pages.git scratch-gui-squashed-clone
Cloning into 'scratch-gui-squashed-clone'...
remote: Enumerating objects: 42971, done.
remote: Counting objects: 100% (1177/1177), done.
remote: Compressing objects: 100% (415/415), done.
remote: Total 42971 (delta 758), reused 1171 (delta 753), pack-reused 41794
Receiving objects: 100% (42971/42971), 404.64 MiB | 4.44 MiB/s, done.
Resolving deltas: 100% (28426/28426), done.
Another option could be moving gh-pages to another repo that would just be used for publishing rather than development.
The gh-pages assets still definitely make up the larger blobs of the repo.
~/code/LLK/scratch-gui-squashed-clone develop 07:47:49
❯ git rev-list --objects --all |
git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' |
sed -n 's/^blob //p' |
sort --numeric-sort --key=2 |
cut -c 1-12,41- |
$(command -v gnumfmt || echo numfmt) --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest | tail -n10
d4188146aeac 13MiB hotfix/totally-normal-2021/lib.min.js
17529bb75d07 14MiB develop/lib.min.js
f01d795250c1 14MiB scratch-desktop/lib.min.js
92adc14f8add 14MiB native/lib.min.js
3abbb626ea8e 14MiB stretchy-paint/lib.min.js
5a51340ed329 15MiB color-swatches/lib.min.js
25deb7569f99 18MiB boost/lib.min.js.map
adf5693bed0f 19MiB boost/lib.min.js
7200d49d2105 19MiB centerCrosshair/lib.min.js.map
13af32bdd340 20MiB centerCrosshair/lib.min.js
There are some larger animated gifs in the source, but they seem to be mostly 1-2MB whereas the sourcemaps and minimized JS files are in the 10-20MB range and there are lots more of those in the published site.
To test the "alternate repo" idea I pushed just develop to another fork and cloned that:
❯ git clone git@github.com:matschaffer/scratch-gui-develop.git --branch develop
Cloning into 'scratch-gui-develop'...
remote: Enumerating objects: 41794, done.
remote: Total 41794 (delta 0), reused 0 (delta 0), pack-reused 41794
Receiving objects: 100% (41794/41794), 306.56 MiB | 270.00 KiB/s, done.
Resolving deltas: 100% (27691/27691), done.
And confirmed that now it's mainly the gif blobs that make up the space:
❯ git rev-list --objects --all |
git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' |
sed -n 's/^blob //p' |
sort --numeric-sort --key=2 |
cut -c 1-12,41- |
$(command -v gnumfmt || echo numfmt) --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest | tail -n10
1a6fab408778 2.9MiB src/lib/libraries/decks/steps/chase-game-move-randomly.es.gif
641a44ec604d 5.3MiB src/lib/libraries/decks/txt/09_hoc-spin.gif
6370bf50054c 7.5MiB src/lib/libraries/decks/steps/video-pet.es.gif
af14625dd9af 8.6MiB src/lib/libraries/decks/cartoonnetwork/09_cn-level-up-say-something.gif
526d2f10cbf6 9.2MiB src/lib/libraries/decks/cartoonnetwork/06_cn-keep-score.gif
cf67f5a89b9c 9.9MiB src/lib/libraries/decks/steps/video-animate.es.gif
5ea3d171bcef 10MiB src/lib/libraries/decks/cartoonnetwork/07_cn-level-up.gif
e008255c5b80 11MiB src/lib/libraries/decks/steps/video-pop.es.gif
e02d200429b6 11MiB src/lib/libraries/decks/cartoonnetwork/03_cn-glide-around.gif
474c5790d124 12MiB src/lib/libraries/decks/cartoonnetwork/04_cn-collect.gif
I also came across this if we wanted to try other options https://github.blog/2020-12-21-get-up-to-speed-with-partial-clone-and-shallow-clone/
Expected Behavior
A fully functional git clone without a multi-GB download.
Actual Behavior
Full repo clone takes ~2GB
Steps to Reproduce
git clone git@github.com:LLK/scratch-gui.git
There are a couple issues that cover this:
The typical answer is to clone with
--depth 1
but this (to my understanding) would leave the clone unable to be used as a development/PR workspace.(from https://github.com/LLK/scratch-gui/issues/5140#issuecomment-1043921195)
I was curious so I tried this https://stackoverflow.com/a/42544963/69002
It seems like what's taking up a lot of the space is dependencies being commited to the gh-pages branch. LIke https://github.com/LLK/scratch-gui/commit/d33ef36cf87dd64aa6143db2ab9554b1acbabfc0 for example.
Those lib.min.js seem to be 15MB-20MB each and get committed a few times a day in a few different subdirectories.
We could eliminate about half of the current repo size by pushing a fresh gh-pages branch with every build.
Something like:
Clients will see something like this on their next pull:
But this doesn't seem to require intervention. I was even able to commit to gh-pages and the next pull rebased successfully.
Doing this would allow easier clones of the repository, and also probably eliminate a good portion of the 1.5 minute clone time on circleci seen currently.
Alternatively we could move gh-pages to a separate repo, but this would require a bit more coordination for anything using the deployed gh-pages site (possibly https://scratch.mit.edu itself? not sure if it's using gh-pages directly or not).