Closed KoningLeon closed 1 month ago
It sounds like you took a guess at what was taking up space and removed some files, but a lot of your space is in other files. Run python $gfr --analyze
from your project, and look at the files in the created $GIT_DIR/filter_repo/analysis report directory after the run. It should tell you what is large.
I knew the files that were the problems because they are the only ones that hold data. The repo now only takes up 80 mb after we remove the troublesome files. Somehow though it's not reflecting in the .pack file shrinking.
I did however take up your advice and ran the analyze command and I might have found something that could explain why the pack isn't shrinking. Some large files still show up in the path-all-sizes as < present > even though the files and folder are no longer part of the repo
And the same goes for the directories-all-sizes. The marked folders are no longer part of the repo, yet they are still marked as < present >.
Managed to get the desired result by doing:
Resulting in our repo going from 13.2gb to 150mb. This means losing the entire history for that specific folder but that is a sacrifice were are willing to make.
Any chance you were using CMD to run your commands? If so, the problem may be that you used single quotes ('
) instead of double quotes ("
). If you changed your command from:
python $gfr --invert-paths --path-glob '*/cache.abf' --path-glob '*.pbix'
to
python $gfr --invert-paths --path-glob "*/cache.abf" --path-glob "*.pbix"
that might have fixed things for you. Apparently (as I learned in #435), the former will cause CMD to tell git-filter-repo that you want to remove files matching '*/cache.abf'
and '*.pbix'
, which you obviously don't have any of, while the latter correctly tells git-filter-repo that you want to remove files matching */cache.abf
and *.pbix
.
To my knowledge, this is unique to CMD; single quotes work fine in any other shell and don't do this crazy weirdness.
No, I was using the Powershell terminal from within VScode.
Well, in that case, I'd suggest adding a --debug
flag to your command so we can see what git-filter-repo actually saw; I have no idea if VScode did some weird interpretation either. And it'd be nice to see the large paths from the --analyze report both before and after you run git-filter-repo with the --debug flag.
That said, it sounds like you did find a solution, so if you don't want to debug further that's fine. But if you'd like to know what happened, the --debug output is the next piece of output I'd need.
No further response so I'll close out. I'm glad you found a solution. If you would like to dig further, feel free to reopen and provide the other bits of info I requested.
In the past my team stored reports including data in our Azure DevOps Git repo which resulted in a size of 13.2gb. Thankfully we've seen the light and bettered our ways last year so the repo currently hasn't contained any reports with data for a while now. I wanted to use your tool to also remove any history of the files for the sake of repo size and security. Unfortunately I haven't been able to reduce the size of the pack files so far. I must admit I am far from a Git Guru so assume my knowledge is very limited :)
What I've done:
git clone --mirror
python $gfr --invert-paths --path-glob '*/cache.abf'
andpython $gfr --invert-paths --path-glob '*.pbix'
(these are the file types that hold the data)The logging for your tool gives me the impressions that any old and unneeded files are cleaned before repacking but maybe I've missed some flag or git command I'm supposed to run.