scratchfoundation / scratch-gui

Graphical User Interface for creating and running Scratch 3.0 projects.
https://scratchfoundation.github.io/scratch-gui/develop/
BSD 3-Clause "New" or "Revised" License
4.43k stars 3.51k forks source link

Reduce size of gh-pages branch #8059

Open matschaffer opened 2 years ago

matschaffer commented 2 years ago

Expected Behavior

A fully functional git clone without a multi-GB download.

Actual Behavior

Full repo clone takes ~2GB

Steps to Reproduce

git clone git@github.com:LLK/scratch-gui.git

There are a couple issues that cover this:

The typical answer is to clone with --depth 1 but this (to my understanding) would leave the clone unable to be used as a development/PR workspace.

(from https://github.com/LLK/scratch-gui/issues/5140#issuecomment-1043921195)

I was curious so I tried this https://stackoverflow.com/a/42544963/69002

It seems like what's taking up a lot of the space is dependencies being commited to the gh-pages branch. LIke https://github.com/LLK/scratch-gui/commit/d33ef36cf87dd64aa6143db2ab9554b1acbabfc0 for example.

Those lib.min.js seem to be 15MB-20MB each and get committed a few times a day in a few different subdirectories.

We could eliminate about half of the current repo size by pushing a fresh gh-pages branch with every build.

Something like:

git checkout --orphan gh-pages-${BUILD_NUMBER}
git commit -am 'Rebuild gh-pages'
git push --force origin gh-pages-${BUILD_NUMBER}:gh-pages

Clients will see something like this on their next pull:

 + 03a60aa...8e48d06 gh-pages   -> origin/gh-pages  (forced update)

But this doesn't seem to require intervention. I was even able to commit to gh-pages and the next pull rebased successfully.

Doing this would allow easier clones of the repository, and also probably eliminate a good portion of the 1.5 minute clone time on circleci seen currently.

Alternatively we could move gh-pages to a separate repo, but this would require a bit more coordination for anything using the deployed gh-pages site (possibly https://scratch.mit.edu itself? not sure if it's using gh-pages directly or not).

matschaffer commented 2 years ago

As a test I pushed the develop branch and a squashed gh-pages to https://github.com/matschaffer/scratch-gui-squashed-gh-pages

It's not tiny but the clone size seems to have been cut by about 70% (~1.9GB to ~400MB)

❯ git clone git@github.com:matschaffer/scratch-gui-squashed-gh-pages.git scratch-gui-squashed-clone
Cloning into 'scratch-gui-squashed-clone'...
remote: Enumerating objects: 42971, done.
remote: Counting objects: 100% (1177/1177), done.
remote: Compressing objects: 100% (415/415), done.
remote: Total 42971 (delta 758), reused 1171 (delta 753), pack-reused 41794
Receiving objects: 100% (42971/42971), 404.64 MiB | 4.44 MiB/s, done.
Resolving deltas: 100% (28426/28426), done.

Another option could be moving gh-pages to another repo that would just be used for publishing rather than development.

matschaffer commented 2 years ago

The gh-pages assets still definitely make up the larger blobs of the repo.

  ~/code/LLK/scratch-gui-squashed-clone   develop                                                              07:47:49
❯ git rev-list --objects --all |
  git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' |
  sed -n 's/^blob //p' |
  sort --numeric-sort --key=2 |
  cut -c 1-12,41- |
  $(command -v gnumfmt || echo numfmt) --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest | tail -n10
d4188146aeac   13MiB hotfix/totally-normal-2021/lib.min.js
17529bb75d07   14MiB develop/lib.min.js
f01d795250c1   14MiB scratch-desktop/lib.min.js
92adc14f8add   14MiB native/lib.min.js
3abbb626ea8e   14MiB stretchy-paint/lib.min.js
5a51340ed329   15MiB color-swatches/lib.min.js
25deb7569f99   18MiB boost/lib.min.js.map
adf5693bed0f   19MiB boost/lib.min.js
7200d49d2105   19MiB centerCrosshair/lib.min.js.map
13af32bdd340   20MiB centerCrosshair/lib.min.js

There are some larger animated gifs in the source, but they seem to be mostly 1-2MB whereas the sourcemaps and minimized JS files are in the 10-20MB range and there are lots more of those in the published site.

matschaffer commented 2 years ago

To test the "alternate repo" idea I pushed just develop to another fork and cloned that:

❯ git clone git@github.com:matschaffer/scratch-gui-develop.git --branch develop
Cloning into 'scratch-gui-develop'...
remote: Enumerating objects: 41794, done.
remote: Total 41794 (delta 0), reused 0 (delta 0), pack-reused 41794
Receiving objects: 100% (41794/41794), 306.56 MiB | 270.00 KiB/s, done.
Resolving deltas: 100% (27691/27691), done.

And confirmed that now it's mainly the gif blobs that make up the space:

❯ git rev-list --objects --all |
  git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' |
  sed -n 's/^blob //p' |
  sort --numeric-sort --key=2 |
  cut -c 1-12,41- |
  $(command -v gnumfmt || echo numfmt) --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest | tail -n10
1a6fab408778  2.9MiB src/lib/libraries/decks/steps/chase-game-move-randomly.es.gif
641a44ec604d  5.3MiB src/lib/libraries/decks/txt/09_hoc-spin.gif
6370bf50054c  7.5MiB src/lib/libraries/decks/steps/video-pet.es.gif
af14625dd9af  8.6MiB src/lib/libraries/decks/cartoonnetwork/09_cn-level-up-say-something.gif
526d2f10cbf6  9.2MiB src/lib/libraries/decks/cartoonnetwork/06_cn-keep-score.gif
cf67f5a89b9c  9.9MiB src/lib/libraries/decks/steps/video-animate.es.gif
5ea3d171bcef   10MiB src/lib/libraries/decks/cartoonnetwork/07_cn-level-up.gif
e008255c5b80   11MiB src/lib/libraries/decks/steps/video-pop.es.gif
e02d200429b6   11MiB src/lib/libraries/decks/cartoonnetwork/03_cn-glide-around.gif
474c5790d124   12MiB src/lib/libraries/decks/cartoonnetwork/04_cn-collect.gif
matschaffer commented 2 years ago

I also came across this if we wanted to try other options https://github.blog/2020-12-21-get-up-to-speed-with-partial-clone-and-shallow-clone/