Closed andimarafioti closed 4 years ago
Yeah, I was aware of that, and at some point I tried to clean it up, but with little luck. Any help would be appreciated, thanks!
Any idea how to fix this? Maybe @bmcfee could give us a hand? 💃
I haven't done this, but this thread might be useful: https://stackoverflow.com/questions/5613345/how-to-shrink-the-git-folder
That sounds great, thanks. Running the following command and crossing my fingers:
git repack -a -d --depth=250 --window=250 -f
mmhh that didn't seem to work. I also tried this, but with now luck either: http://stevelorek.com/how-to-shrink-a-git-repository.html
Any other suggestions?
Just some logging.
Running git gc --aggressive
gives us a 787 MB folder.
I then followed this link that @urinieto brought: http://stevelorek.com/how-to-shrink-a-git-repository.html
First output is:
All sizes are in kB. The pack column is the size of the object, compressed, inside the pack file.
size pack SHA location
58440 43836 9fa916e426fd95e63750888e37c5c382cff6f3ce examples/Run MSAF.ipynb
58426 43822 eb54cfbb1d2b882bf7f3f68c0bf6ad015b8690a0 examples/Run MSAF.ipynb
58404 43771 84715cc02724bc1af7d163b245d674d985877a96 examples/Run MSAF.ipynb
44443 13809 84490d3885af1ebf0b7a5c764e7873905ad4d8a5 datasets/SPAM/features/SPAM_Cerulean_Yes-Starship_Trooper:_A._Life_Seeker _B._Disillu.json
44292 13727 983fcc34a153ad3d01ee852616b47a580d070ee6 datasets/SPAM/features/SPAM_Cerulean_Miles_Davis_Quintet-Footprints.json
43058 12937 0a78f16cc61c1d46da935e8866ed083bcf8fcfd0 datasets/SPAM/features/SPAM_SALAMI_1482.json
42028 12703 dba38b0123b11b6a63c3c714d37ebdfab47b0cff datasets/SPAM/features/SPAM_SALAMI_164.json
40088 12049 746d614299ddbee39180f12365d4a0bbf879c5cb datasets/SPAM/features/SPAM_SALAMI_1198.json
39567 12161 f49bac17c16e5dc2f5c41d0f6fac119ac8791b49 datasets/SPAM/features/SPAM_SALAMI_478.json
39488 12701 3ea3933a7f98bad9cc19dfb56c2a28b12e52032a datasets/SPAM/features/SPAM_Cerulean_Bob_Dylan-Hurricane.json
We see that there are some files in datasets and examples that are being kept even though they are not in the current version. Now, that website is made for one huge file and we have here several small ones. The next sort of interesting output is this one:
git filter-branch --tag-name-filter cat --index-filter 'git rm -r --cached --ignore-unmatch filename' --prune-empty -f -- --all
Rewrite 9dbb57d77a1310465a65cc40f1641d083ca74385 (1280/1280) (1406 seconds passed, remaining 0 predicted)
WARNING: Ref 'refs/heads/dtw' is unchanged
WARNING: Ref 'refs/heads/jblsmith-0.1.0-dev' is unchanged
WARNING: Ref 'refs/heads/keunwoochoi-issue18' is unchanged
WARNING: Ref 'refs/heads/master' is unchanged
WARNING: Ref 'refs/heads/revert-27-0.1.0-dev' is unchanged
WARNING: Ref 'refs/heads/siamese' is unchanged
WARNING: Ref 'refs/remotes/origin/master' is unchanged
WARNING: Ref 'refs/remotes/origin/dtw' is unchanged
WARNING: Ref 'refs/remotes/origin/jblsmith-0.1.0-dev' is unchanged
WARNING: Ref 'refs/remotes/origin/keunwoochoi-issue18' is unchanged
WARNING: Ref 'refs/remotes/origin/master' is unchanged
WARNING: Ref 'refs/remotes/origin/revert-27-0.1.0-dev' is unchanged
WARNING: Ref 'refs/remotes/origin/siamese' is unchanged
WARNING: Ref 'refs/tags/v0.0.4' is unchanged
WARNING: Ref 'refs/tags/v0.1.0' is unchanged
WARNING: Ref 'refs/tags/v0.1.1' is unchanged
WARNING: Ref 'refs/tags/v0.1.2' is unchanged
WARNING: Ref 'refs/tags/v0.1.3' is unchanged
WARNING: Ref 'refs/tags/v0.1.4' is unchanged
WARNING: Ref 'refs/tags/v0.1.5' is unchanged
WARNING: Ref 'refs/tags/v0.1.51' is unchanged
WARNING: Ref 'refs/tags/v0.1.6' is unchanged
WARNING: Ref 'refs/tags/v0.1.61' is unchanged
WARNING: Ref 'refs/tags/v0.1.70' is unchanged
v0.0.4 -> v0.0.4 (2a8ac92aff682eb453838b161fce61a036a1b3a7 -> 2a8ac92aff682eb453838b161fce61a036a1b3a7)
v0.1.0 -> v0.1.0 (02cb02ab925b70c8a6f77a8ea24f57dcfbe17ba6 -> 02cb02ab925b70c8a6f77a8ea24f57dcfbe17ba6)
v0.1.1 -> v0.1.1 (8143c4a0b0c99fb9f6edaac82446b3addc74d43f -> 8143c4a0b0c99fb9f6edaac82446b3addc74d43f)
v0.1.2 -> v0.1.2 (a48efec0cbbfcf083644bde70d9d997c66a42141 -> a48efec0cbbfcf083644bde70d9d997c66a42141)
v0.1.3 -> v0.1.3 (2776d4d2696f063a5ea91bdc49de69372c8d7bd8 -> 2776d4d2696f063a5ea91bdc49de69372c8d7bd8)
v0.1.4 -> v0.1.4 (8160c1933c7b2c485582d72ae6a810349654a2c8 -> 8160c1933c7b2c485582d72ae6a810349654a2c8)
v0.1.5 -> v0.1.5 (004144bba8762ce8b39c901dff1f66abdb349320 -> 004144bba8762ce8b39c901dff1f66abdb349320)
v0.1.51 -> v0.1.51 (d54e82526b6897ebedef15076470bf127188449b -> d54e82526b6897ebedef15076470bf127188449b)
v0.1.6 -> v0.1.6 (60cd7c8bd4941afb8bb78e58571e6faeeef955c7 -> 60cd7c8bd4941afb8bb78e58571e6faeeef955c7)
v0.1.61 -> v0.1.61 (91997af5b900970cf97d891da002a25c7809328a -> 91997af5b900970cf97d891da002a25c7809328a)
v0.1.70 -> v0.1.70 (2a6274594c248e729f5ddd78d2b4be8c5fb28f84 -> 2a6274594c248e729f5ddd78d2b4be8c5fb28f84)
After finishing with the whole thing, the folder is 709 Mb. Running again the first bash to list the bigger tracked files, I get exactly the same output. Running again the rest of the instructions doesn't change anything.
And 2 years later...
I followed these instructions and removed any historic file over 1MB by doing this:
java -jar bfg.jar --strip-blobs-bigger-than 1M msaf.git
All files of 1MB or more should be in the msaf-data repo.
The git folder should be now around 27MB (instead of 844MB!). Thanks all of you for your help!
I am so happy you solved this. Thank you.
I realized the .git folders is 844Mb big while cloning the project. Perhaps at one point the dataset was part of the project and git is saving it? Any git experts who could tell if this is normal?