rtyley / bfg-repo-cleaner

Removes large or troublesome blobs like git-filter-branch does, but faster. And written in Scala
https://rtyley.github.io/bfg-repo-cleaner/
GNU General Public License v3.0
11.17k stars 550 forks source link

Update documentation to explain how to mirror-push up to a GitHub repo with pull-requests #16

Open rtyley opened 11 years ago

rtyley commented 11 years ago

The problem manifests like this:

$ git fetch -q && git push -q --mirror github
remote: error: hook declined to update refs/pull/1001/head
remote: error: hook declined to update refs/pull/1001/merge
(snip)
remote: error: hook declined to update refs/pull/957/head
remote: error: hook declined to update refs/pull/957/merge
To git@github.com:pdurbin/openscholar.git
 * [new branch]      1017 -> 1017
(snip)
 * [new branch]      origin/SCHOLAR-3.x-make-1072 -> origin/SCHOLAR-3.x-make-1072
 * [new tag]         SCHOLAR-2-0-BETA1 -> SCHOLAR-2-0-BETA1
(snip)
 * [new tag]         SCHOLAR-3.1.6 -> SCHOLAR-3.1.6
 ! [remote rejected] refs/pull/1001/head -> refs/pull/1001/head (hook declined)
 ! [remote rejected] refs/pull/1001/merge -> refs/pull/1001/merge (hook declined)
(snip)
 ! [remote rejected] refs/pull/957/head -> refs/pull/957/head (hook declined)
 ! [remote rejected] refs/pull/957/merge -> refs/pull/957/merge (hook declined)
error: failed to push some refs to 'git@github.com:pdurbin/openscholar.git'

See also:

http://christoph.ruegg.name/blog/2013/1/26/git-howto-mirror-a-github-repository-without-pull-refs.html

Cleaning refs that are - effectively - from other people's repos might not be possible, but we could consider renaming the refs to give a "here's how you fix your pull request" ref.

jlukic commented 11 years ago

I just ran into this while trying out bfg on Semantic UI

rtyley commented 11 years ago

It's a fiddly problem - you can update all the 'real' refs in your repo, but all the ones beginning 'refs/pull' are synthetic read-only refs created by GitHub - you can't update (and therefore 'clean') them if they're from outside your repository (like https://github.com/jlukic/Semantic-UI/pull/183, for instance).

So, if you're pushing your updated refs up to your repository, all the non-pull-request refs are accepted and fixed, but the Pull Request ref updates will be rejected.

I'm really not sure if there's anyway the BFG can make this experience nicer... as an open question, can youthink of something you would like to happen?

jlukic commented 11 years ago

I think i might have done something terribly destructive..

I ran and pushed to remote with --strip-biggest-blobs 500 instead of --strip-blobs-bigger-than 1M

And now my previous commit history is littered with missing files https://github.com/jlukic/Semantic-UI/tree/9c2d248a1db821560aba68446d92eeef12087e3e/build/packaged/javascript

discussion on semantic issues https://github.com/jlukic/Semantic-UI/issues/220

jlukic commented 11 years ago

I assumed the refs didnt push because the pull request refs were rejected. I wish that git gave better notice here.

I ended up cloning without pull refs using the guide above http://christoph.ruegg.name/blog/git-howto-mirror-a-github-repository-without-pull-refs.html

I think just pointing the issue out in the docs would be enough for other users, or maybe a separate walkthrough on how to clone while excluding pull refs.

rtyley commented 11 years ago

Sorry to hear about the problems with the update of Semantic-UI. If your intended threshold was --strip-blobs-bigger-than 1M it looks like there would have been ~85 objects to be removed - obviously those files would still have changed to become file.REMOVED.git-id files in your history.

One thing the BFG does to protect you from unintended consequences is not alter the contents of your latest (HEAD) commit - so your current work can never be lost.

As it happens, your old history and files are pretty much not lost, precisely because of GitHub not allowing Pull Request refs to be overridden - meaning that in GitHub, at least as of Friday, refs that refer to your old history still existed, and doing a --mirror clone would still retrieve all of the history that was in the repo up to the point of those PRs being requested. If all your pull-requests have been updated, that may no longer be the case, but I took a mirror clone on Friday when the data was available, and would be happy to send it to you if you want to go through the (rather fiddly) process of restoring your history to how it was before you ran the BFG. The hassle involved with updating all of your collaborators (once again) may mean it's not really worth doing however.

Incidentally, the usage instructions for the BFG, do advise taking a back-up mirror copy of the repo before proceeding - it's always possible to recover from that if the BFG does something that you don't want.

isomorphisms commented 9 years ago

(I was going to make this commit myself, but can't find where the code to rtyley.github.io/bfg-repo-cleaner/ is hosted)

This mistake is probably commoner than it needs to be. Why not change the first example line (which most people will probably copy-paste and run) so that it deletes the biggest 1 files? Or better yet, --strip-blobs-bigger-than 10000M so that people see what the output looks like without accidentally rming?

(I know it's apparently reversible, but it's not exactly easy to figure out how)

paulschreiber commented 6 years ago

+1 this.