scalameta / metals

Scala language server with rich IDE features 🚀
https://scalameta.org/metals/
Apache License 2.0
2.07k stars 323 forks source link

Clean large files from git history #502

Closed olafurpg closed 5 years ago

olafurpg commented 5 years ago

Cloning the Metals repo is getting larger due the number of fat GIFs and images. We should replace the embedded GIFs/images with external links to hosted assets on GitHub or imgur.com. Then we can use https://rtyley.github.io/bfg-repo-cleaner/ to rewrite the git history to remove the large files. Although rewriting the git history can break URLs to old commits, I prefer to do it now than later.

gabro commented 5 years ago

Some more info on this. I've analyzed the repo using this script and here's the list of every file in git history that weights >= 100k

825c3a6da4ab   19MiB bin/scalafmt
7845de0a5004  2.2MiB website/static/img/jump-to-definition.gif
e39a61dd1a7e  2.2MiB img/jump-to-definition.gif
8dc043ceb508  1.2MiB metals/src/main/resources/sbt-launch.jar
3519eb5d39a7  846KiB docs/assets/vscode-document-symbol-command.gif
bf62ecf1041a  677KiB website/static/img/goto-definition.gif
c5f1f7623061  677KiB img/goto-definition.gif
b7a2a00edf16  571KiB metals-bench/flamegraphs/profile-2018-01-21.zip
25805b687feb  474KiB website/static/img/jump-to-definition.gif
8decfcbb0c04  406KiB docs/assets/emacs-demo.gif
f4d8f5ab82d4  371KiB docs/assets/http-run-doctor.png
b5921b9a4db0  368KiB docs/assets/metals-http-client.png
4d31dfc0d5b1  340KiB docs/assets/vscode-sbt-script.png
93ef1e4a2e76  336KiB docs/assets/vscode-sbt-launcher.png
18de5be37420  309KiB docs/assets/vim-document-symbol.png
21c8090c4ece  308KiB website/static/img/hover.gif
7f0f14d6c90a  307KiB img/hover.gif
baa5ad2808f5  289KiB docs/assets/vscode-server-version.png
c24efe7488b4  284KiB docs/assets/emacs-import-changes.png
32adf98481fa  282KiB docs/assets/emacs-import-build-command.png
e708ee256a30  280KiB docs/assets/emacs-import-build.png
59ff7fdb4c05  257KiB docs/assets/vscode-document-symbol.gif
45e4b6041574  254KiB images/colors.png
5b1c975d0d5d  243KiB docs/assets/code-actions.gif
b5d745ff3efc  207KiB docs/assets/atom-demo.gif
390a4ae9c9ec  196KiB website/static/img/accurate-diagnostics.png
7c48dd29ede8  196KiB img/accurate-diagnostics.png
d522bd946f9a  192KiB docs/assets/code-actions.png
2659ab506d81  155KiB docs/assets/atom-import-via-bloop.png
f154627180b4  149KiB docs/assets/sublime-enable-lsp.gif
cb10504b4ad7  145KiB images/screen.png
7911115f73db  144KiB docs/assets/atom-import-changes.png
6f835c27d5c5  141KiB docs/assets/sublime-demo.gif
42d8ff3acb33  131KiB docs/assets/atom-import-changes.png
be62926ca58c  126KiB website/static/img/prisma.png
c9ebf386c689  123KiB docs/assets/vscode-import-build.png
f96dd68f9f40  122KiB docs/assets/atom-run-doctor.png
fc5d55307e6a  120KiB docs/assets/atom-import-build.png
70eff13dfb3c  120KiB docs/assets/metals-slow-task.gif
bdff97ca82c4  117KiB docs/assets/vscode-run-doctor.png
5e23b5aa54e2  113KiB docs/assets/vscode-import-changes.png
8e37fb85ddac  112KiB docs/assets/metals-did-focus.gif
8c6ddae272d8  107KiB docs/assets/vim-import-via-bloop.gif
dd6322597a61  106KiB vscode-extension/package-lock.json
f3af9c014a7e  106KiB vscode-extension/package-lock.json
74b4c57b3132  105KiB vscode-extension/package-lock.json
59faff21b564  103KiB docs/assets/metals-status.gif
60d5f0cbcc4c  103KiB vscode-extension/package-lock.json
05b309ba5d76  102KiB docs/assets/sbt-bloopinstall.png
6c4d930b0e74  101KiB docs/assets/vim-demo.gif
40c277da86e2  100KiB code.gif
dff3ce1bc511  100KiB vscode-extension/package-lock.json

So it looks like - yes - gifs are an issue, but the single biggest file is by far bin/scalafmt.

gabro commented 5 years ago

Also, I've tried removing bin/scalafmt and here's the result.

Cloning the current repo

> git clone git@github.com:scalameta/metals.git
Cloning into 'metals'...
remote: Enumerating objects: 153, done.
remote: Counting objects: 100% (153/153), done.
remote: Compressing objects: 100% (97/97), done.
remote: Total 16681 (delta 69), reused 106 (delta 36), pack-reused 16528
Receiving objects: 100% (16681/16681), 39.04 MiB | 2.81 MiB/s, done.
Resolving deltas: 100% (7411/7411), done.

Total: 39.04MiB

Cloning https://github.com/gabro/metals-on-a-diet

git clone git@github.com:gabro/metals-on-a-diet.git
Cloning into 'metals-on-a-diet'...
remote: Enumerating objects: 14855, done.
remote: Counting objects: 100% (14855/14855), done.
remote: Compressing objects: 100% (4997/4997), done.
remote: Total 14855 (delta 6314), reused 14855 (delta 6314), pack-reused 0
Receiving objects: 100% (14855/14855), 14.94 MiB | 758.00 KiB/s, done.
Resolving deltas: 100% (6314/6314), done.

Total: 14.94MiB

Not bad 🎉

gabro commented 5 years ago

Finally, here's the clone of metals after removing all gifs:

> git clone git@github.com:gabro/metals-on-a-diet.git
Cloning into 'metals-on-a-diet'...
remote: Enumerating objects: 14836, done.
remote: Counting objects: 100% (14836/14836), done.
remote: Compressing objects: 100% (4997/4997), done.
remote: Total 14836 (delta 6317), reused 14800 (delta 6294), pack-reused 0
Receiving objects: 100% (14836/14836), 8.69 MiB | 290.00 KiB/s, done.
Resolving deltas: 100% (6317/6317), done.

Total: 8.69Mib

gabro commented 5 years ago

Woops, it appears the bin/scalafmt thing was my fault! 🙀

https://github.com/scalameta/metals/commit/ec1161fecfaec6824cba1eb2a21806f972c8f67f#diff-8fa9a52f4661568d880572b38a3a7238

Binary files are so easy to miss in GitHub review interface 😩

olafurpg commented 5 years ago

Thanks a lot for looking into this @gabro, very interesting! There are a few binaries like sbt-launch.jar that we can't avoid. 9mb is still a great improvement from 40mb. Before we go ahead with the git rewrite we should replace links on the website to embedded assetes with imgur.com links. Then I'm all 👍 in favor of rewriting the git history.

MasseGuillaume commented 5 years ago

This is what I use for scalafmt: https://github.com/scalacenter/scala-syntax/blob/master/scalafmt

olafurpg commented 5 years ago

Fixed now, large files have been cleaned from the git history and the repo now weighs only 4mb (compared to 65mb previously!).

The recommended steps to rebase WIP branches are

  1. squash your branch into a single commit, push to your fork 2 clone the metals repo in a clean new directory
  2. create new branch in the clean directory
  3. git remote add <your fork>
  4. in your clean branch in the clean clone, git cherry-pick <squashed-commit>