layer5io / layer5

Layer5, expect more from your infrastructure
https://layer5.io
Apache License 2.0
877 stars 1.23k forks source link

Large repo size: unwanted .pack files #2163

Open leecalcote opened 3 years ago

leecalcote commented 3 years ago

Current Behavior The layer5 repo is over 2GB in size due to unwanted .pack files in .git/objects/pack.

Desired Situation A smaller repo size.


Contributor Resources

The layer5.io website uses Gatsby, React, and GitHub Pages. Site content is found under the master branch.

Jordan-Rob commented 3 years ago

hi @leecalcote can I take this on?

warunicorn19 commented 3 years ago

sure @Jordan-Rob, sorry missed the comment.

adithyaakrishna commented 3 years ago

@leecalcote @warunicorn19 I think those pack files are not committed to the repo. And these files are required as part of git object database, https://git-scm.com/book/en/v2/Git-Internals-Git-Objects

Ref: https://stackoverflow.com/questions/49535201/pack-file-remove-it-in-git

warunicorn19 commented 3 years ago

ohh okay, so a big NO NO on deleting the .git/objects/pack files.

adithyaakrishna commented 3 years ago

Yep, and I don't think it would matter as .git is not committed to the repo

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Aju100 commented 2 years ago

Hey @warunicorn19 @adithyaakrishna, can you please check out the https://github.com/18F/C2/issues/439 seems like we can reduce unwanted .pack files.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Chadha93 commented 2 years ago

@adithyaakrishna @warunicorn19 any insights on this? And on @Aju100's approach?

Abhijay007 commented 2 years ago

@leecalcote @Chadha93 I want to take up this issue if no one is working on it rn.

Chadha93 commented 2 years ago

All yours @Abhijay007

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 2 years ago

This issue is being automatically closed due to inactivity. However, you may choose to reopen this issue.

leecalcote commented 2 years ago

After the site is built, node_modules and .cache take up some space, but both of these directories are .gitignored. The .pack files in the .git directory are the culprit.

--- /layer5 -------------------------------------------------------------------------
    2.4 GiB [##############] /.git
    1.0 GiB [######        ] /node_modules
  510.5 MiB [##            ] /.cache
  367.0 MiB [##            ] /src
  271.9 MiB [#             ] /public
   52.2 MiB [              ] /static
    1.8 MiB [              ]  package-lock.json
  428.0 KiB [              ] /.github
  324.0 KiB [              ] /.devcontainer
  196.0 KiB [              ] /content-learn
   20.0 KiB [              ]  gatsby-node.js
   20.0 KiB [              ]  CONTRIBUTING.md
   20.0 KiB [              ] /.vscode
   16.0 KiB [              ]  gatsby-config.js
   12.0 KiB [              ]  LICENSE
   12.0 KiB [              ]  README.md
   12.0 KiB [              ] /.husky
    8.0 KiB [              ]  .DS_Store
    4.0 KiB [              ]  package.json
    4.0 KiB [              ]  fonts.css
    4.0 KiB [              ]  .eslintrc.js
    4.0 KiB [              ]  GOVERNANCE.md
    4.0 KiB [              ]  .gitignore
    4.0 KiB [              ]  Makefile
    4.0 KiB [              ]  root-wrapper.js
    4.0 KiB [              ]  CODE_OF_CONDUCT.md
    4.0 KiB [              ]  Makefile.show-help.mk
    4.0 KiB [              ]  .babelrc
    4.0 KiB [              ]  .eslintignore
    4.0 KiB [              ]  gatsby-browser.js
    4.0 KiB [              ]  script.sh
    4.0 KiB [              ]  gatsby-ssr.js
    4.0 KiB [              ]  .gitattributes
    4.0 KiB [              ]  .env.development
    4.0 KiB [              ]  CODEOWNERS
    4.0 KiB [              ]  CNAME
 Total disk usage:   4.6 GiB  Apparent size:   4.2 GiB  Items: 139,693
aaqilrk commented 2 years ago

Hey, @leecalcote, @Chadha93 can I look into this issue ?

XDRAGON2002 commented 2 years ago

@leecalcote I looked through the .pack files that we have in order to identify exactly what blobs are taking up so much space and as far as I can tell the cause are the assets or the media files that we use such as .png/.jpg/.mp4 files, running git gc --aggressive helps only a bit, and upon looking for ways to reduce the .pack files size the most recommended way is to actually get rid of the media entries from the repo and storing them elsewhere.

leecalcote commented 2 years ago

Thanks for looking into this, @XDRAGON2002. Yes, I agree. The size of the .git directory in the comment above reinforces this fact.

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 1 year ago

This issue is being automatically closed due to inactivity. However, you may choose to reopen this issue.

leecalcote commented 1 year ago

FYI @randychilau

randychilau commented 1 year ago

Hi @leecalcote,

Unfortunately the process to reduce/clean a repo does not seem straightforward or well documented for a public repo like Layer5.

I have outlined what I believe to be the phases required for this task, please let me know if there are any items missing, issues overlooked, or questions. It also seems this will require a fair amount of coordination and proper scheduling to execute, especially for the later phases.

I only have a basic understanding of git, so it would be great to have more experienced users review the information below.

Please include whoever else should be in this discussion.

Cheers, Randy


Note:

before_after


Phase 1: Create a test filtered clone

  1. Remove all unused packages (using tools like depcheck, IDE find to double-check) and any files from assets and static folders that are not being used anymore (e.g. zip files, confirm with maintainers for actual clone)

  2. Remove untracked files and directories using git clean

  3. Install and configure Git LFS.

  4. move all large files and/or specified file types to Git LFS two methods:

  5. Utilize the filter-repo script which:

    “Rapidly rewrite entire repository history using user-specified filters. This is a destructive operation which should not be used lightly; it writes new commits, trees, tags, and blobs corresponding to (but filtered from) the original objects in the repository, then deletes the original history and leaves only the new.”

    -- Use git clone --bare for copy of layer5 and fetch LFS objects -- Run filter-repo script with –analyze flag, sample: filter-repo2 -- Run filter-repo script with --invert-paths --paths-from-file ./filter-repo/analysis/path-deleted-sizes.txt

  6. Upload test filtered clone to a created Layer5 test repo.


Phase 2: Review test filtered clone for functionality, GitHub Actions, history, etc


If the following are approved and decided:

  1. Process for creating the test filtered clone

  2. Test filtered clone functionality and history

  3. Where to upload the final clone (existing or new repo)


Phase 3: Get the current repo in a finalized state to create filtered clone

  1. all open pull requests should be either closed or merged

    “The git filter-repo tool and the BFG Repo-Cleaner rewrite your repository's history, which changes the SHAs for existing commits that you alter and any dependent commits. Changed commit SHAs may affect open pull requests in your repository. We recommend merging or closing all open pull requests before removing files from your repository.” (src)

  2. Notify all current and potential contributors that the repo will be undergoing maintenance and there will be no access or activity to the repo while going through this process.


Phase 4: Create filtered clone and upload to the decided location

  1. Create backup of repository

  2. Go through the approved filtered clone creation process

  3. Upload clone to the decided location

  4. If this is a new location

    • transfer/migrate information (e.g. issues)
    • build site and make sure custom url is pointing to correct repo/branch and the site is live.

Phase 5: After upload, update contributors and relevant information

  1. Update CONTRIBUTION.MD and related files, text to include any instruction changes (e.g. using LFS)
  2. Notify contributors on actions to take to reconcile with the new repo (e.g. create from new clone url).

References:

leecalcote commented 1 year ago

@Nikhil-Ladha

leecalcote commented 1 year ago

@randychilau FYI - https://discuss.layer5.io/t/looking-for-a-difficult-git-challenge/2996