Closed vlsi closed 5 years ago
I just pushed a cleaned up repo. I did get some errors for the pull request related refs:
! [remote rejected] refs/pull/1/head -> refs/pull/1/head (deny updating a hidden ref)
! [remote rejected] refs/pull/1/merge -> refs/pull/1/merge (deny updating a hidden ref)
...
@vlsi let me know if this comes across clean for you. I do have a backup of the old repo if we need to restore it.
That was fast.
There were the following folders as well: Jenna-2.6.3
, commons-lang-2.3
do you need them?
For instance: java -jar bfg.jar --delete-folders Jenna-2.6.3
, java -jar bfg.jar --delete-folders commons-lang-2.3
Note: by default BFG does not keep reference to "former commit ids" when deleting files (it thinks the files contain private info, so it hides previous commits to avoid someone brute-forcing the data).
You might use my BFG release: https://github.com/vlsi/bfg-repo-cleaner/releases which happens to have --no-private
flag so you could have "former-commit-id" header (see sample here: https://github.com/apache/jmeter/commit/7561325be56c0481488da4d0307885611017acb6 )
PS. which software to you use to produce LICENSE.spdx / SPDXParser.spdx? It looks like every time you save the file its contents is randomly shuffled, so git thinks you are creating multimegabyte file "from scratch".
I did get some errors for the pull request related refs:
That is expected. GitHub does not allow to update pull/* refs.
Since it looks like my rewriting history tripped up ORT (heremaps/oss-review-toolkit) and it is now a reasonable size - I think I'll just leave the 2 unused Jenna and commons lang folders.
@vlsi Thanks for the suggestion and info.
PS. which software to you use to produce LICENSE.spdx / SPDXParser.spdx? It looks like every time you save the file its contents is randomly shuffled, so git thinks you are creating multimegabyte file "from scratch".
This is produced from the SPDX Maven Plugin. Since it is in RDF format and the Jena libraries do not preserve any order, it gets completely regenerated. I'm thinking it should be removed from the source directory entirely and only store in the release artifacts including Maven Central, Bintray and the Github release artifacts rather than keeping it in a directory under source control.
I'll open a new issue for this.
I'm thinking it should be removed from the source directory entirely and only store in the release artifacts including Maven Central
Please do that if that is not required (+remove from historical commits).
Jena libraries do not preserve any order, it gets completely regenerated
I think it is valid to raise an issue to Jena (or SPDX Maven Plugin) to add explicit ordering, so the build artifacts could be reproducible.
Please do that if that is not required (+remove from historical commits).
Please do not rewrite the history again, it's a bad practice for public repositories. @vlsi: You didn't provide a rationale why the repository size is an issue. If it's about clone performance e.g. on CI, why not use a shallow clone?
If it's about clone performance
Clone performance for testing purposes (e.g. running tests). The rest operations like commit are impacted as well because git requires GC from time to time which is impacted by the repo size.
There's a side concern as well: disk space for all involved parties. The ones who clone, Travis, GitHub, etc, etc.
it's a bad practice for public repositories
By the way, you didn't provide a rationale why this specific repo must not be rebased.
Clone performance for testing purposes (e.g. running tests).
For this, like suggested, you can use shallow clones, because you usually don't need the commit history to run tests. No need to rewrite the history.
The rest operations like commit are impacted as well because git requires GC from time to time which is impacted by the repo size.
There's a side concern as well: disk space for all involved parties. The ones who clone, Travis, GitHub, etc, etc.
This sounds rather theoretical, performance issues with git commit
in a 62M repository? What I was up to with my question was: do you actually have issues with the size of the repository, or is this premature optimization?
By the way, you didn't provide a rationale why this specific repo must not be rebased.
Nothing special with this repository, just the general issue that now all forks and all local copies of the repository are out of sync, causing extra work for contributors. Balancing gain and cost of rewriting history is of course up to @goneall.
do you actually have issues with the size of the repository
It was so slow to download so I went ahead and created an issue.
Nothing special with this repository
This repo has 0 PRs, and just 38 forks. So it should not hurt much.
It makes sense to remove jars from the repository so it is simpler to use.
Top
jar
consumers are:https://rtyley.github.io/bfg-repo-cleaner/ can help with file removal:
java -jar bfg.jar --delete-files '*.jar'
results in 20MiB repository (10x reduction).