Open wutiejun opened 8 years ago
https://git-scm.com/book/en/v2/Git-Internals-Maintenance-and-Data-Recovery
https://git-scm.com/book/zh/v1/Git-%E5%B7%A5%E5%85%B7-%E9%87%8D%E5%86%99%E5%8E%86%E5%8F%B2
这个命令理解小有一点复杂:
$ git filter-branch --tag-name-filter cat --index-filter 'git rm -r --cached --ignore-unmatch filename' --prune-empty -f -- --all
查了一下帮助手册: Checklist for Shrinking a Repository git-filter-branch is often used to get rid of a subset of files, usually with some combination of --index-filter and --subdirectory-filter. People expect the resulting repository to be smaller than the original, but you need a few more steps to actually make it smaller, because git tries hard not to lose your objects until you tell it to. First make sure that:
You really removed all variants of a filename, if a blob was moved over its lifetime. git log --name-only --follow --all -- filename can help you find renames. You really filtered all refs: use --tag-name-filter cat -- --all when calling git-filter-branch. Then there are two ways to get a smaller repository. A safer way is to clone, that keeps your original intact.
Clone it with git clone file:///path/to/repo. The clone will not have the removed objects. See Section G.3.21, “git-clone(1)”. (Note that cloning with a plain path just hardlinks everything!) If you really don't want to clone it, for whatever reasons, check the following points instead (in this order). This is a very destructive approach, so make a backup or go back to cloning it. You have been warned.
Remove the original refs backed up by git-filter-branch: say git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d. Expire all reflogs with git reflog expire --expire=now --all. Garbage collect all unreferenced objects with git gc --prune=now (or if your git-gc is not new enough to support arguments to --prune, use git repack -ad; git prune instead).
How to Shrink a Git Repository 如何缩小一个GIT仓库
http://stevelorek.com/how-to-shrink-a-git-repository.html
Our main Git repository had suddenly ballooned in size. It had grown overnight to 180MB (compressed) and was taking forever to clone.
我们的git仓库的大小突然爆发式增大了。一下子增大了180M(压缩后的),而且每个克隆都会增加。
The reason was obvious; somebody, somewhere, somewhen, somehow, had committed some massive files. But we had no idea what those files where.
原因也是很清楚的:某人,某时,某地,某原因,提交大量的文件。但我们对这些文件在哪没什么想法(就放哪都行)。
After a few hours of trial, error and research, I was able to nail down a process to: 经过几个小的尝试,出错和研究,我可以确定以下几个步骤:
This process should never be attempted unless you can guarantee that all team members can produce a fresh clone. It involves altering the history and requires anyone who is contributing to the repository to pull down the newly cleaned repository before they push anything to it.
这些过程最好永远也不要尝试,除非你能保证你的团队所有成员可以处理一个新的克隆(即:丢弃原来的工作空间)。它会触发变更仓库的历史记录,而且需要所有仓库的贡献者重新取回一份新的仓库,在他们推送任何内容之前。
Deep Clone the Repository 深度克隆仓库
If you don't already have a local clone of the repository in question, create one now:
如果你你还没有一个在考虑(缩小)中的仓库的本地克隆,现在就创建一个:
Now—you may have cloned the repository, but you don't have all of the remote branches. This is imperative to ensure a proper 'deep clean'. To do this, we'll need a little Bash script:
现在你可能已经克隆了这个仓库,但你并没有所有的远程分支。对于深度清理这个是必要的。为了实现它,我们要一个Bash脚本(实现与远程所有分支的同步):
Thanks to bigfish on StackOverflow for this script, which is copied verbatim.
感谢gitfish在StackOverflow上提供的这个脚本,这是一字不差的复制过来的。
Copy this code into a file, chmod +x filename.sh, and then execute it with ./filename.sh. You will now have all of the remote branches as well (it's a shame Git doesn't provide this functionality).
复制代码到一个文件中,chmod +x filename.sh, 然后执行它。你就可以很好的得到远程上的所有分支了(git没有提供这个功能真是个丢人的事)。
Discovering the large files 发现大文件
Credit is due to Antony Stubbs here - his Bash script identifies the largest files in a local Git repository, and is reproduced verbatim below:
【略:脚本脚本找到大文件即可】
Execute this script as before, and you'll see some output similar to the below: All sizes are in kB. The pack column is the size of the object, compressed, inside the pack file.
执行前面的脚本,你会看到类似这样的输出:所有的大小是kB,pack列是对象的大小,压缩的,在pack文件中。
Yep - looks like someone has been pushing some rather unnecessary files somewhere! Including a lovely 1.1GB present in the form of a SQL dump file.
耶,看上去是有人推送了一些并不需要的文件。包括一个可爱的SQL导出文件。
Cleaning the files 清除文件
Cleaning the file will take a while, depending on how busy your repository has been. You just need one command to begin the process:
清理这些文件要一点时间,取决于你的仓库,你只需要一个命令来处理:
This command is adapted from other sources—the principal addition is --tag-name-filter cat which ensures tags are rewritten as well.
这个命令从其它源文件--规则添加的--中适配, --tag-name-filter用于很好的抓取重写的标签。
After this command has finished executing, your repository should now be cleaned, with all branches and tags in tact.
在该命令执行完以后,你的仓库就应该清理了,包括所有的分支和标签。
Reclaim space 回收空间
While we may have rewritten the history of the repository, those files still exist in there, stealing disk space and generally making a nuisance of themselves. Let's nuke the bastards:
我们现在要重写仓库的历史,这些文件还在这里,偷偷占用磁盘空间并且经常给自己找麻烦。让我们(用核弹)摧毁这讨厌的家伙:
Now we have a fresh, clean repository. In my case, it went from 180MB to 7MB.
现在世界清静了,清理仓库,在我的场景下,它从180M缩小到7M
Push the cleaned repository 推送干净的仓库
Now we need to push the changes back to the remote repository, so that nobody else will suffer the pain of a 180MB download.
现在我需要推送修改后的结果到远程仓库中,这样所有人都不用承受这180M的下载痛苦了:
The --all argument pushes all your branches as well. That's why we needed to clone them at the start of the process.
--all参数推送你所有的分支。这也就是为什么一开始我们要克隆它的原因。
Then push the newly-rewritten tags:
然后推送最新写入的标签:
Tell your teammates 告诉你的同事
Anyone else with a local clone of the repository will need to either use git rebase, or create a fresh clone, otherwise when they push again, those files are going to get pushed along with it and the repository will be reset to the state it was in before.
不管是谁,只要有一份该仓库的克隆,都将需要使用git rebase,或者重新创建一个新的克隆,否则当他们再次推送时,这些文件会被推送回来,然后仓库又回到之前状态了。