urinieto / msaf

Music Structure Analysis Framework
MIT License
490 stars 78 forks source link

.git folder is 844Mb big #76

Closed andimarafioti closed 4 years ago

andimarafioti commented 6 years ago

I realized the .git folders is 844Mb big while cloning the project. Perhaps at one point the dataset was part of the project and git is saving it? Any git experts who could tell if this is normal?

urinieto commented 6 years ago

Yeah, I was aware of that, and at some point I tried to clean it up, but with little luck. Any help would be appreciated, thanks!

urinieto commented 6 years ago

Any idea how to fix this? Maybe @bmcfee could give us a hand? 💃

bmcfee commented 6 years ago

I haven't done this, but this thread might be useful: https://stackoverflow.com/questions/5613345/how-to-shrink-the-git-folder

urinieto commented 6 years ago

That sounds great, thanks. Running the following command and crossing my fingers:

git repack -a -d --depth=250 --window=250 -f

urinieto commented 6 years ago

mmhh that didn't seem to work. I also tried this, but with now luck either: http://stevelorek.com/how-to-shrink-a-git-repository.html

Any other suggestions?

andimarafioti commented 6 years ago

Just some logging.

Running git gc --aggressive gives us a 787 MB folder.

I then followed this link that @urinieto brought: http://stevelorek.com/how-to-shrink-a-git-repository.html

First output is:

All sizes are in kB. The pack column is the size of the object, compressed, inside the pack file.
size   pack   SHA                                       location
58440  43836  9fa916e426fd95e63750888e37c5c382cff6f3ce  examples/Run                                                               MSAF.ipynb
58426  43822  eb54cfbb1d2b882bf7f3f68c0bf6ad015b8690a0  examples/Run                                                               MSAF.ipynb
58404  43771  84715cc02724bc1af7d163b245d674d985877a96  examples/Run                                                               MSAF.ipynb
44443  13809  84490d3885af1ebf0b7a5c764e7873905ad4d8a5  datasets/SPAM/features/SPAM_Cerulean_Yes-Starship_Trooper:_A._Life_Seeker  _B._Disillu.json
44292  13727  983fcc34a153ad3d01ee852616b47a580d070ee6  datasets/SPAM/features/SPAM_Cerulean_Miles_Davis_Quintet-Footprints.json
43058  12937  0a78f16cc61c1d46da935e8866ed083bcf8fcfd0  datasets/SPAM/features/SPAM_SALAMI_1482.json
42028  12703  dba38b0123b11b6a63c3c714d37ebdfab47b0cff  datasets/SPAM/features/SPAM_SALAMI_164.json
40088  12049  746d614299ddbee39180f12365d4a0bbf879c5cb  datasets/SPAM/features/SPAM_SALAMI_1198.json
39567  12161  f49bac17c16e5dc2f5c41d0f6fac119ac8791b49  datasets/SPAM/features/SPAM_SALAMI_478.json
39488  12701  3ea3933a7f98bad9cc19dfb56c2a28b12e52032a  datasets/SPAM/features/SPAM_Cerulean_Bob_Dylan-Hurricane.json

We see that there are some files in datasets and examples that are being kept even though they are not in the current version. Now, that website is made for one huge file and we have here several small ones. The next sort of interesting output is this one:

 git filter-branch --tag-name-filter cat --index-filter 'git rm -r --cached --ignore-unmatch filename' --prune-empty -f -- --all
Rewrite 9dbb57d77a1310465a65cc40f1641d083ca74385 (1280/1280) (1406 seconds passed, remaining 0 predicted)
WARNING: Ref 'refs/heads/dtw' is unchanged
WARNING: Ref 'refs/heads/jblsmith-0.1.0-dev' is unchanged
WARNING: Ref 'refs/heads/keunwoochoi-issue18' is unchanged
WARNING: Ref 'refs/heads/master' is unchanged
WARNING: Ref 'refs/heads/revert-27-0.1.0-dev' is unchanged
WARNING: Ref 'refs/heads/siamese' is unchanged
WARNING: Ref 'refs/remotes/origin/master' is unchanged
WARNING: Ref 'refs/remotes/origin/dtw' is unchanged
WARNING: Ref 'refs/remotes/origin/jblsmith-0.1.0-dev' is unchanged
WARNING: Ref 'refs/remotes/origin/keunwoochoi-issue18' is unchanged
WARNING: Ref 'refs/remotes/origin/master' is unchanged
WARNING: Ref 'refs/remotes/origin/revert-27-0.1.0-dev' is unchanged
WARNING: Ref 'refs/remotes/origin/siamese' is unchanged
WARNING: Ref 'refs/tags/v0.0.4' is unchanged
WARNING: Ref 'refs/tags/v0.1.0' is unchanged
WARNING: Ref 'refs/tags/v0.1.1' is unchanged
WARNING: Ref 'refs/tags/v0.1.2' is unchanged
WARNING: Ref 'refs/tags/v0.1.3' is unchanged
WARNING: Ref 'refs/tags/v0.1.4' is unchanged
WARNING: Ref 'refs/tags/v0.1.5' is unchanged
WARNING: Ref 'refs/tags/v0.1.51' is unchanged
WARNING: Ref 'refs/tags/v0.1.6' is unchanged
WARNING: Ref 'refs/tags/v0.1.61' is unchanged
WARNING: Ref 'refs/tags/v0.1.70' is unchanged
v0.0.4 -> v0.0.4 (2a8ac92aff682eb453838b161fce61a036a1b3a7 -> 2a8ac92aff682eb453838b161fce61a036a1b3a7)
v0.1.0 -> v0.1.0 (02cb02ab925b70c8a6f77a8ea24f57dcfbe17ba6 -> 02cb02ab925b70c8a6f77a8ea24f57dcfbe17ba6)
v0.1.1 -> v0.1.1 (8143c4a0b0c99fb9f6edaac82446b3addc74d43f -> 8143c4a0b0c99fb9f6edaac82446b3addc74d43f)
v0.1.2 -> v0.1.2 (a48efec0cbbfcf083644bde70d9d997c66a42141 -> a48efec0cbbfcf083644bde70d9d997c66a42141)
v0.1.3 -> v0.1.3 (2776d4d2696f063a5ea91bdc49de69372c8d7bd8 -> 2776d4d2696f063a5ea91bdc49de69372c8d7bd8)
v0.1.4 -> v0.1.4 (8160c1933c7b2c485582d72ae6a810349654a2c8 -> 8160c1933c7b2c485582d72ae6a810349654a2c8)
v0.1.5 -> v0.1.5 (004144bba8762ce8b39c901dff1f66abdb349320 -> 004144bba8762ce8b39c901dff1f66abdb349320)
v0.1.51 -> v0.1.51 (d54e82526b6897ebedef15076470bf127188449b -> d54e82526b6897ebedef15076470bf127188449b)
v0.1.6 -> v0.1.6 (60cd7c8bd4941afb8bb78e58571e6faeeef955c7 -> 60cd7c8bd4941afb8bb78e58571e6faeeef955c7)
v0.1.61 -> v0.1.61 (91997af5b900970cf97d891da002a25c7809328a -> 91997af5b900970cf97d891da002a25c7809328a)
v0.1.70 -> v0.1.70 (2a6274594c248e729f5ddd78d2b4be8c5fb28f84 -> 2a6274594c248e729f5ddd78d2b4be8c5fb28f84)

After finishing with the whole thing, the folder is 709 Mb. Running again the first bash to list the bigger tracked files, I get exactly the same output. Running again the rest of the instructions doesn't change anything.

urinieto commented 4 years ago

And 2 years later...

I followed these instructions and removed any historic file over 1MB by doing this:

java -jar bfg.jar --strip-blobs-bigger-than 1M msaf.git

All files of 1MB or more should be in the msaf-data repo.

The git folder should be now around 27MB (instead of 844MB!). Thanks all of you for your help!

andimarafioti commented 4 years ago

I am so happy you solved this. Thank you.