Open leecalcote opened 3 years ago
hi @leecalcote can I take this on?
sure @Jordan-Rob, sorry missed the comment.
@leecalcote @warunicorn19 I think those pack files are not committed to the repo. And these files are required as part of git object database, https://git-scm.com/book/en/v2/Git-Internals-Git-Objects
Ref: https://stackoverflow.com/questions/49535201/pack-file-remove-it-in-git
ohh okay, so a big NO NO on deleting the .git/objects/pack
files.
Yep, and I don't think it would matter as .git is not committed to the repo
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Hey @warunicorn19 @adithyaakrishna, can you please check out the https://github.com/18F/C2/issues/439 seems like we can reduce unwanted .pack files.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
@adithyaakrishna @warunicorn19 any insights on this? And on @Aju100's approach?
@leecalcote @Chadha93 I want to take up this issue if no one is working on it rn.
All yours @Abhijay007
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue is being automatically closed due to inactivity. However, you may choose to reopen this issue.
After the site is built, node_modules
and .cache
take up some space, but both of these directories are .gitignore
d. The .pack
files in the .git
directory are the culprit.
--- /layer5 -------------------------------------------------------------------------
2.4 GiB [##############] /.git
1.0 GiB [###### ] /node_modules
510.5 MiB [## ] /.cache
367.0 MiB [## ] /src
271.9 MiB [# ] /public
52.2 MiB [ ] /static
1.8 MiB [ ] package-lock.json
428.0 KiB [ ] /.github
324.0 KiB [ ] /.devcontainer
196.0 KiB [ ] /content-learn
20.0 KiB [ ] gatsby-node.js
20.0 KiB [ ] CONTRIBUTING.md
20.0 KiB [ ] /.vscode
16.0 KiB [ ] gatsby-config.js
12.0 KiB [ ] LICENSE
12.0 KiB [ ] README.md
12.0 KiB [ ] /.husky
8.0 KiB [ ] .DS_Store
4.0 KiB [ ] package.json
4.0 KiB [ ] fonts.css
4.0 KiB [ ] .eslintrc.js
4.0 KiB [ ] GOVERNANCE.md
4.0 KiB [ ] .gitignore
4.0 KiB [ ] Makefile
4.0 KiB [ ] root-wrapper.js
4.0 KiB [ ] CODE_OF_CONDUCT.md
4.0 KiB [ ] Makefile.show-help.mk
4.0 KiB [ ] .babelrc
4.0 KiB [ ] .eslintignore
4.0 KiB [ ] gatsby-browser.js
4.0 KiB [ ] script.sh
4.0 KiB [ ] gatsby-ssr.js
4.0 KiB [ ] .gitattributes
4.0 KiB [ ] .env.development
4.0 KiB [ ] CODEOWNERS
4.0 KiB [ ] CNAME
Total disk usage: 4.6 GiB Apparent size: 4.2 GiB Items: 139,693
Hey, @leecalcote, @Chadha93 can I look into this issue ?
@leecalcote I looked through the .pack
files that we have in order to identify exactly what blobs are taking up so much space and as far as I can tell the cause are the assets or the media files that we use such as .png/.jpg/.mp4 files, running git gc --aggressive
helps only a bit, and upon looking for ways to reduce the .pack
files size the most recommended way is to actually get rid of the media entries from the repo and storing them elsewhere.
Thanks for looking into this, @XDRAGON2002. Yes, I agree. The size of the .git
directory in the comment above reinforces this fact.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue is being automatically closed due to inactivity. However, you may choose to reopen this issue.
FYI @randychilau
Hi @leecalcote,
Unfortunately the process to reduce/clean a repo does not seem straightforward or well documented for a public repo like Layer5.
I have outlined what I believe to be the phases required for this task, please let me know if there are any items missing, issues overlooked, or questions. It also seems this will require a fair amount of coordination and proper scheduling to execute, especially for the later phases.
I only have a basic understanding of git, so it would be great to have more experienced users review the information below.
Please include whoever else should be in this discussion.
Cheers, Randy
Note:
All of the repo changes can be tested on a clone and uploaded to a new Layer5 test repo for testing/review.
Using Git LFS seems to be a best practice for assets (e.g. image, video, zip files).
The big question is whether to upload the filtered clone and overwrite the existing repo (complex), or create a new repo to upload to (simple). Also there are logistics required in either case (e.g. issues, pull requests, comments, etc).
If you wish to upload a filtered clone to the existing repo, there are many considerations involved as described in the “DISCUSSION” section (points 4, 5, 6) of the filter-repo
user manual. Here is one of them:
“People who cloned from the original repo will have old history. When they fetch the new history you force pushed up, unless they do a git reset --hard @{u} on their branches or rebase their local work, git will think they have hundreds or thousands of commits with very similar commit messages as what exist upstream (but which include files you wanted excised from history), and allow the user to merge the two histories, resulting in what looks like two copies of each commit. If they then push this history back up, then everyone now has history with two copies of each commit and the bad files have returned. You’re more likely to succeed in forcing people to get rid of the old history if they have to clone a new URL.”
Here is a glimpse at the potential final result for repo size:
Phase 1: Create a test filtered clone
Remove all unused packages (using tools like depcheck, IDE find
to double-check) and any files from assets
and static
folders that are not being used anymore (e.g. zip files, confirm with maintainers for actual clone)
Remove untracked files and directories using git clean
move all large files and/or specified file types to Git LFS two methods:
renormalize
Utilize the filter-repo
script which:
“Rapidly rewrite entire repository history using user-specified filters. This is a destructive operation which should not be used lightly; it writes new commits, trees, tags, and blobs corresponding to (but filtered from) the original objects in the repository, then deletes the original history and leaves only the new.”
-- Use git clone --bare
for copy of layer5 and fetch LFS objects
-- Run filter-repo script with –analyze
flag, sample:
-- Run filter-repo script with --invert-paths --paths-from-file ./filter-repo/analysis/path-deleted-sizes.txt
Upload test filtered clone to a created Layer5 test repo.
Phase 2: Review test filtered clone for functionality, GitHub Actions, history, etc
If the following are approved and decided:
Process for creating the test filtered clone
Test filtered clone functionality and history
Where to upload the final clone (existing or new repo)
Phase 3: Get the current repo in a finalized state to create filtered clone
all open pull requests should be either closed or merged
“The git filter-repo tool and the BFG Repo-Cleaner rewrite your repository's history, which changes the SHAs for existing commits that you alter and any dependent commits. Changed commit SHAs may affect open pull requests in your repository. We recommend merging or closing all open pull requests before removing files from your repository.” (src)
Notify all current and potential contributors that the repo will be undergoing maintenance and there will be no access or activity to the repo while going through this process.
Phase 4: Create filtered clone and upload to the decided location
Create backup of repository
Go through the approved filtered clone creation process
Upload clone to the decided location
If this is a new location
Phase 5: After upload, update contributors and relevant information
References:
@Nikhil-Ladha
@randychilau FYI - https://discuss.layer5.io/t/looking-for-a-difficult-git-challenge/2996
Current Behavior The
layer5
repo is over 2GB in size due to unwanted .pack files in.git/objects/pack
.Desired Situation A smaller repo size.
Contributor Resources
The layer5.io website uses Gatsby, React, and GitHub Pages. Site content is found under the
master
branch.