schacon / hg-git

mercurial to git bridge, pushed to directly from the hg-git plugin in Hg
GNU General Public License v2.0
620 stars 71 forks source link

Unable to push large Hg repository to git #87

Closed nevali closed 10 years ago

nevali commented 14 years ago

Hi,

I’m attempting to create a git mirror of a rather large Hg repository (OpenSolaris’ onnv-gate). I’ll say from the outset that I wasn’t sure if it’d work, but figured I’d give it a try.

I’ve got a clone of onnv-gate sat in one directory, and an empty bare git repository in another. My “paths” section includes:

default-push = file:///repo/git/onnv.git

I set an “hg push” going about 8 hours ago and the last output was “importing Hg objects into Git”, and it doesn’t seem to have done a huge amount besides consuming memory :) The target git repository remains empty (according to 'du', which says it’s 38KB).

Doing an “hg -v push” doesn’t look right at all (but I’m not really an Hg user, so I don’t know):

pushing to file:///repo/git/onnv.git
importing Hg objects into Git
converting revision ?X?H`??1"䃷
                           ??r?-
converting revision ??n???Qp;??.?V]rh[
converting revision ?ڮ??D???a?ۋ?㱽??%
converting revision ???葉'Uü?2?1c??
converting revision 7??w??1D?/7?(j`??

(That was a 30-second run, after which I interrupted it).

Is there a way to make this work? If it will work, but will just take a very long time, I can set it running in a screen session and leave it, but I didn’t want to do that if it’s going to sit there indefinitely, as it looked like it was doing. Or is the repository simply too big for hggit to handle right now?

All of this does go some way to indicating why there’s no git mirror of the OpenSolaris source tree lying around anywhere :)

Cheers,

Mo.

nevali commented 14 years ago

I just noticed the .hg/git directory and .hg/git-mapfile files. It seems in this case .hg/git is 250MB, while .hg/git-mapfile is itself 3.8MB, which suggests that the push was definitely working. If I re-start it, will it be able to carry on where it left off, or is that wishful thinking? :)

rctay commented 14 years ago

Hi,

the gibberish in the "converting revision" messages are expected - it's a bug in hg-git. (I've fixed this in my fork, and I'm still waiting for a response to my pull request.)

From my experience, yes, you can kill and start again the conversion - but don't take my word for it. :)

abderrahim commented 14 years ago

ok, I think this needsa bit of explaining, when pushing to a git repository, what happens is :

  1. everything is converted to git : this step can be done using hg gexport, it may take a long time if the repository is very large, and is safe to interrupt (it will resume from where it left). The result is saved in .hg/git, you may want to repack it after you finish. There is a script and a patch for using git-fast-import, I'll try to see if they still work and post them later.
  2. actually pushing to the destination, this needs to put everything in one pack, a bug in dulwich may make it use too much memory (I have some patches for this), you may need to split this in parts (e.g. 500 revs at a time) by using something like hg book -fr master hg push -r master
rctay commented 14 years ago

I usually kill hg when I see that it's done converting the hg objects into git ones, and then push it with git manually.

abderrahim: I'd be interested to see if you've got anything to dulwich push faster.

nevali commented 14 years ago

Thanks for all of the pointers and explanations. I realised what the process was after digging around in the Hg repo and discovering .hg/git – du tells me it’s growing, but incredibly slowly — it’s growing by about 10MB/hour (and this is on a pretty fast, idle, server). If the patch for using git-fast-import works and will speed this part of it up, I would very much appreciate it :)

(At the time of writing this, .hg/git stands at about 500MB, and that's within 10-20MB of what I'd expect it to be by the time it's completed — but this is educated guesswork, so things to speed this in the event that I’m dead wrong would be handy ;)

nevali commented 14 years ago

Okay, several days later, it's now completed "Importing Hg objects into Git". It subsequently fell over with "abort: out of memory" during the "creating and sending data" phase, but I did also have a git-svn fetch running which also seems to be buggy in slightly interesting resource-related ways. I'll give it another try, but look to split it up if it fails again.

Out of interest, once the revs have been converted, does 'hg push' do anything except the equivalent of a 'git push' from within /.hg/git directory? that is, would a native 'git push --mirror /path/to/remote' do exactly as 'hg push' is trying to without any ill-effects later on down the line?

rctay commented 14 years ago

You need to know when it's finished updating the refs and starts pushing. Myself, I've hacked my hg-git installation to print a message when it's pushing to the git repository. When I see it, I kill hg and run

git --git-dir=.hg/git push git://foo.com/repo.git master

To be sure, you can just pass --dry-run to see what's being pushed first.

nevali commented 14 years ago

I gave it a shot anyway :) it's a larger repo than I can be bothered spending the time running 'du' on (it took about 90 minutes on the 'counting objects' phase of the 'git push', though in fairness to it I was again running a big 'git svn fetch' in the background.

If the conversion step can be invoked by way of 'hg gexport' and 'git push' can be used to do the work of actually pushing to the remote, presumably I could dispense with 'hg push' altogether. This is all going to be automated as part of a cron job, so being longer-winded isn't a concern!

abderrahim commented 14 years ago

you may want to try the pack branch of my fork of dulwich, it's way slower than the current implementation but way less memory/bandwidth hungry (if your pushing to a local repo, not being bandwith hungry doesn't mean any gain).

And yes, you could get away with hg gexport + git push, but you may want to take care with regard to branches : hg-git won't push named branches, it'll only push bookmarks, you can do something like hg book -fr for each branch before pushing (maybe replacing default with master). And git push --mirror should be also fine.