toddmedema / electrify

Take Charge of the Power Market
http://electrifygame.com
MIT License
6 stars 2 forks source link

Tip: leverage Git LFS to separate source-code from media #50

Closed n-hebert closed 3 months ago

n-hebert commented 3 months ago

Hello! I see that media has crept into the repository itself, causing the clone to be very large and slow.

https://git-lfs.com/ and other solutions can help to reduce the size back down so that building can occur with or without all media assets, and permit flexible changes to the media without growing the repo to absurd proportions.

A git repo of this size should generally be under a megabyte, but the current clone is ~170MB and that'll only get worse.

Fix Steps

  1. Set up a Git LFS endpoint where media can go
  2. Migrate all media to Git LFS
  3. Use https://github.com/newren/git-filter-repo to prune all objects from the entire git history, not just the current commit
  4. Force push the smaller target back, potentially also eliminating all branches on the old base

This is a good step to take before going too far or including other collaborators. While it's just yourself coding on it, this is effortless to do. As soon as someone else begins coding, you've lost a lot of the ability to make this change.

Therefore, this is somewhat of an "urgent fix needed". Happy to help with further tips where you need them!

toddmedema commented 3 months ago

This is a very good point! I'll do this tonight

toddmedema commented 3 months ago

@n-hebert did some digging - there aren't huge files stored in the repo any more, so figured it must be a git history thing / LFS might not actually help much. Rang BFG repo cleaner and removed anything in the history over 1mb, hopefully that helps!

n-hebert commented 3 months ago

@n-hebert did some digging - there aren't huge files stored in the repo any more, so figured it must be a git history thing / LFS might not actually help much. Rang BFG repo cleaner and removed anything in the history over 1mb, hopefully that helps!

hey @toddmedema . Small correction -- they were definitely in the repo, since the history is part of the repo :wink: You may have meant the current commit's tree, which is usually the case as things change and grow; it's quite common that the largest files are in the past.

BFG is okay, but I can report that the repository still definitely has the files. here's a view of a fresh clone!

$ git clone git@github.com:toddmedema/electrify.git
Cloning into 'electrify'...
remote: Enumerating objects: 11898, done.
remote: Counting objects: 100% (5388/5388), done.
remote: Compressing objects: 100% (822/822), done.
remote: Total 11898 (delta 1938), reused 5297 (delta 1884), pack-reused 6510
Receiving objects: 100% (11898/11898), 168.48 MiB | 12.43 MiB/s, done.
Resolving deltas: 100% (5083/5083), done.
$ cd electrify/
$ du -sh .git
169M    .git

It will take a force push and deletion of all the related objects. It's quite an involved process. If you want to have a call to tackle it, maybe we can set-up a time :slightly_smiling_face:

toddmedema commented 3 months ago

Ugh, that's frustrating, BFG said it removed a bunch of stuff. I haven't worked with LFS before, would it actually work if the items are just in the history? If not, we may need to explore the force push / delete option (I did find some instructions for that online too, although they all had a bunch of danger / warning signs, so I wanted to try BFG first)

n-hebert commented 3 months ago

Yeah, that's why I mentioned in the top post to migrate all objects and then do the force push/delete. In order to be fully happy there's never a data loss, simply create a "electrify-private" private repo on GitHub and push all current branches over there. You can alternatively or also, if concerned about interrupting service, have a "electrify-reduced" private repo to trial solutions and then swap the names when it's ready. I've done that before.

The danger buttons are all going to be your friend on this one. If any branch has the history of the large objects, they will appear on every clone, barring some caveats not worth outlining now, so you'll want to make the repo squeaky clean, force push back the target without any large objects and then ensure all branches are removed referring to the large repo, and then run the routine clean-up.

Again, I've done this a lot in my line of work, so feel free to dm me on Climatebase and we can set up a quick video call to do it. I usually get it done quite quickly.

n-hebert commented 3 months ago

Hey @toddmedema , I just spotted that there's actually migration already built into Git LFS now. It looks like it'll do it all on your side, effortlessly. Now that you have the clone back-up from today, try running this:

git lfs migrate import --include="*.ico" --include="*.jpg" --include="*.mp3" --include="*.png" --include="*.svg" --include=zipalign --include="*.ai" --include="*.psd" --include-ref=refs/heads/master

courtesy of: https://github.com/git-lfs/git-lfs/wiki/Tutorial#migrating-existing-repository-data-to-lfs

It actually should automagically wipe all of the history of the large objects. Then just close down those dependabot branches and delete them and all should be done here!

I'll leave it to you for this instead of contributing my alternate repo, as this is supposed to be one-touch, require privs, but probably work best on the o.g. repo (this one), so best for you to run it.

Let me know if that sounds good and/or how it works!

toddmedema commented 3 months ago

@n-hebert cool! Looks like it worked, just pushed that, and now a fresh clone looks like only ~30Mb on wire/70Mb on disk!

n-hebert commented 3 months ago

Excellent! I Just confirmed here and yeah, the .git directory is only 34M! :partying_face:

$ du -sh .git
34M .git

Yet... quite oddly, I don't actually think it migrated the files :laughing: When I look at .gitattributes, I see only *.psd and I still see the mp3's in the repo. So it looks like we're making progress but not totally done.

Do you want to take a peek at that .gitattributes file and see if you can get the git lfs ls-files to show some output? Currently it's empty for me.

n-hebert commented 3 months ago

It might just be that I mis-used --include. Maybe you need to have some more interesting syntax. I was hoping it would just collate them together, but it looks like it instead did a "last one wins" solution, and only psd was taken?

Just guessing! Still learning about the migrate command.

toddmedema commented 3 months ago

git lfs ls-files also comes up empty for me, hmm

n-hebert commented 3 months ago

Does it allow one at a time migration?

git lfs migrate import --include="*.ico" --include-ref=refs/heads/master
git lfs migrate import --include="*.jpg" --include-ref=refs/heads/master
git lfs migrate import --include="*.mp3" --include-ref=refs/heads/master
git lfs migrate import --include="*.png" --include-ref=refs/heads/master
git lfs migrate import --include="*.svg" --include-ref=refs/heads/master
git lfs migrate import --include=zipalign --include-ref=refs/heads/master
git lfs migrate import --include="*.ai" --include-ref=refs/heads/master

Maybe we can do it the boring way.

toddmedema commented 3 months ago

Looks like that worked!

toddmedema commented 3 months ago

Although now when I try to clone, I get some errors:

Downloading services/app/src/audio/basic/high.mp3 (9.1 MB) Error downloading object: services/app/src/audio/basic/high.mp3 (67d0aed): Smudge error: Error downloading services/app/src/audio/basic/high.mp3 (67d0aed2a313e1c6b3a9cc8044f74153b7fe2de01bdcd3bce7c3fae2d956b333): [67d0aed2a313e1c6b3a9cc8044f74153b7fe2de01bdcd3bce7c3fae2d956b333] Object does not exist on the server: [404] Object does not exist on the server

Errors logged to '/Users/todd/code/bob/electrify/.git/lfs/logs/20240318T203324.293874.log'. Use git lfs logs last to view the log. error: external filter 'git-lfs filter-process' failed fatal: services/app/src/audio/basic/high.mp3: smudge filter lfs failed warning: Clone succeeded, but checkout failed. You can inspect what was checked out with 'git status' and retry with 'git restore --source=HEAD :/'

n-hebert commented 3 months ago

Yep! I'm digging into that, too. On the bright side, we're very on track. Once the objects are in the LFS, nothing more to worry about; we're there. And I see that LFS now auto-pulls itself so, all that fuss about hooks I mentioned earlier has been relegated a concern of the past.

n-hebert commented 3 months ago

I wonder if the tutorial failed to specify this --everything or --info switch. https://github.com/git-lfs/git-lfs/issues/3525 When I run git lfs-files I see the files there, so we're /really/ close to being done. It just looks like the files aren't wired up properly. It's sad there's still TODOs on the wiki, it looks like this migrate command needs some more docs love; I wager we quote unquote used it wrong, but also there was no better information there :stuck_out_tongue_closed_eyes:

n-hebert commented 3 months ago

hmm ... did you do the git lfs push origin master --all from the docs? Maybe that was necessary, but it's specified randomly below.

see: https://github.com/git-lfs/git-lfs/issues/3525#issuecomment-752969928

toddmedema commented 3 months ago

Woohoo! After struggling with git credential manager / auth stuff for a bit, I got git lfs push origin master --all to work, and it looks like clone is now succeeding too!

n-hebert commented 3 months ago

:tada: :boom: :partying_face: :tada: Alriiight! Go @toddmedema !

$ du -sh .git/* | sort -rh
33M .git/lfs
1.9M    .git/objects
80K .git/hooks
32K .git/logs
28K .git/refs
24K .git/index
8.0K    .git/info
4.0K    .git/packed-refs
4.0K    .git/HEAD
4.0K    .git/description
4.0K    .git/config
4.0K    .git/branches

RESOLVED :heavy_check_mark:

toddmedema commented 3 months ago

WOOHOOOO! Thank you so much for your help!

n-hebert commented 3 months ago

Happy to help and looking forward to collab'ing further on the slimmed down repo now that it's ready!

toddmedema commented 3 months ago

Took a stab at cleaning up the unused dependencies to help further (was a little embarrassing... like I mentioned, copied it over from https://expeditiongame.com/ without a deep clean)

Any particular issues strike your fancy? (or, any ideas for issues / improvements from your experience that I haven't spotted?)

n-hebert commented 3 months ago

I'll probably pitch in with the routine upgrades and then see what's to do next, that'll let me get some time to ramp up on the code internals and conjecture better about what other little things there are to do. GitHub actions or build pipelines might be something worth chatting about more at that time, too, once I familiarize myself with the publishing flow.