newren / git-filter-repo

Quickly rewrite git repository history (filter-branch replacement)
Other
8.55k stars 708 forks source link

[Suggestion] Add branch names to analysis files, if possible #514

Closed pww217 closed 4 months ago

pww217 commented 1 year ago

Hey there!

First off, this is a great tool and thank you for maintaining it.

We had a problem with one of our repos growing out of control, up to a size of 15GB which naturally was breaking a lot of our tooling and causing serious delays.

We used this tool's --analyze function and it did eventually lead us to the answer, but here's the kicker:

The offending branch that was storing all the large files was the gh-pages branch, detached from main. So we were so confused because we could not find where in the world these large files were being committed.

Eventually PyCharm's git tooling allowed us to discover that it was in fact this detached branch that was the offender. Upon deleting it the size went down 99%.

I remarked after we had figured this all out that the analysis files -while very useful, don't get me wrong - would have led us to an answer much sooner if it had mentioned which branch the file was deleted from (we just assumed it was main).

So in hopes of saving others similar frustration in the future, I'd suggest adding branches to these files, if at all possible and not too difficult.

All that said, we'd still have this problem if not for your tool so thank you again!

newren commented 1 year ago

I'm glad the tool was able to help you, at least part way.

You have an interesting idea, but in practice I'm not sure it'd work very well except in very unusual circumstances such as yours. For the typical repository, this would result in listing virtually every branch name along with every single file, since older files are typically part of every branch in a repository. That'd basically just be super noisy and distracting.

Perhaps I could instead document somewhere that people should use git log --all --source -- ${FILENAMES} to help them find where files were introduced?

Of course, even that can be misleading, since when a commit can be reached from multiple refs that command would only list one of them, but would help people find big stuff that is not in the history of HEAD, but is in the part of some branch that has diverged from HEAD, or is part of some branch that is completely orthogonal to the history of HEAD.

pww217 commented 1 year ago

Thank you for the response. I completely understand the additional complexity with maybe not a lot of payoff. Our problem really was a unique one.

Perhaps this isn't the right feature to add then. There are other tools that can search remote branches (in our case pycharm, possibly VSCode with some extensions), and I know all too well what feature bloat can do to a tool so I can respect wanting to "keep it simple" as much as it's possible.

I appreciate your time. Whatever you choose to do, feel free to close this issue.