schacon / hg-git

mercurial to git bridge, pushed to directly from the hg-git plugin in Hg
GNU General Public License v2.0
620 stars 71 forks source link

very slow push performance with large repo #32

Closed chadrik closed 10 years ago

chadrik commented 14 years ago

i'm working with a large repo (170mb in my .hg dir) with a lot of history and binary data. pushing a simple one line text change to my remote git repo takes almost 25 minutes. is the plugin is re-uploading the entire repo? is there anything i can do to prep my repo in hg to speed this up? is this problem related to the "thin pack" todo item? if so, i'm curious what the technical hurdle was that prevented thin packing from being implemented in the first place.

edrex commented 14 years ago

I have a similar issue: pushing my personal ikiwiki repo from the hg-git copy on my laptop takes about 5 minutes. Could a dev comment on how to debug this, or confirm that it is probably the result of a known issue? Thanks -- Eric

edrex commented 14 years ago

I am observing a huge amount of data transferred over the wire for a 1-line commit. Is this the currently-accepted behavior? It is clearly very broken, and this should be a high priority issue.

edrex commented 14 years ago

I'm guessing that these two upstream issues are the source of the inefficient network traffic:

https://bugs.launchpad.net/dulwich/+bug/562676 https://bugs.launchpad.net/dulwich/+bug/562673

probably the first one is the real killer, since in my test I was only dealing with a single small object, so the fact that it wasn't being sent as a delta shouldn't make much of a difference.

matlinuxer2 commented 11 years ago

I have the same problem, too. ( my current version is 0.3.4 )

Is there any update for this issue? Thanks~

keflavich commented 11 years ago

+1 - hggit is extremely slow for some repositories, at least. It all seems to happen after the creating and sending data step - are there any timing checks I can run to help figure out where the block is?

matlinuxer2 commented 11 years ago

I found the bottleneck is " upload_pack(...) " in hggit/git_handler.py:

797 try: 798 self.ui.status(_("creating and sending data\n")) 799 new_refs = client.send_pack(path, changed, genpack) 800 return old_refs, new_refs

The hg push spent most time on the function of "client.send_pack( path, changed, genpack )".

keflavich commented 11 years ago

That implies the actual bottleneck is in the Dulwich client, or just that too many commits are being marked for upload.

matlinuxer2 commented 11 years ago

I think there should be lot of duplicated data in this function call, it did take too much long...

WymzeeLabs commented 11 years ago

I too have this issue. Manually installing and using dulwich-0.9.0 instead of 0.8.7 fixes it for me.

kodawah commented 10 years ago

I can confirm that installing and using dulwich 0.9.0 fixed all network issues for a quite large repo (+800MB)