rvaughn / cvs-fast-export

CVSNT-to-Git conversion utility
11 stars 4 forks source link

Incremental Conversion #2

Closed robotroll closed 9 years ago

robotroll commented 9 years ago

Hi,

first of all a big thanks for this converter. It's the only one so far that is able to convert my messed up cvsnt repro. Since I try to be easy on the transition to git I would like to ask if there are any plans on implementing a incremental conversion feature? My repro is huge (goin back to 2007) and this would really improve the conversion speed a lot for me.

Best Regards, Robin

rvaughn commented 9 years ago

Hi, thanks for the kind words. When I started this thing, I originally assumed repos would have clean, sane commit histories. I quickly found out that was a bad, bad assumption. I had a bunch of broken ones to convert too.

I considered incremental conversion previously, but it's hard to do well. There are really three main problems:

  1. CVS commit histories can change. Some of the same "features" that lead to messed up CVS repos (like moving archives around on the server) can rearrange history in hard-to-detect ways. An incremental conversion will not see those changes, so you'll never see them reflected in Git. As long as you're careful to make all new commits to CVS normally, and accept the risk, this should not be a big problem.
  2. The minor commit reordering that cvs-fast-export can do means that each new commit to CVS can rearrange the last few that got output to Git. This feature will probably have to be disabled if you are doing incremental conversions.
  3. It will probably have to redo all tags every time, even during incrementals. Tags aren't first-class objects in CVS, so it's not possible to tell when they last changed. If you use a lot of tags, this may slow down your incremental conversions anyway.

What kind of conversion times are you getting now? I might be willing to add an incremental feature, but I can't really tell you how long it will take me. The best advice I can give you for now is to run cvs-fast-export on a big machine and give it as much memory as you can. Multiple cores won't help much, but it is a huge memory hog when converting big CVSNT repos. If your host machine starts swapping, it will really kill your conversion times.

robotroll commented 9 years ago

Hi,

thanks for the quick and insightful reply. My conversion as a whole is taking round about 20 h. And that's only because I filter out the huge SQL files before each conversion. They usually double the conversion time. I can see why you didn't implement the incremental conversion yet and with your reasoning behind that it's ok for me. I'm working around this issue at the moment through splitting up the repository in smaller logical junks that are faster on the conversion. The downside is that I end up with about 20 - 30 separate Git repositories.

One other thing that would ease my conversion would be some kind of filtering. Currently I'm achieving the filtering during the copy of the original CVS repository to my conversion server. But I have to wipe the source folder whenever the context changes. Did you plan anything in that direction? Some kind of folder and file name filter would do the trick I guess.

Regards, Robin

rvaughn commented 9 years ago

Early on I considered a whole system of filtering, renaming, rearranging, etc. but ultimately I just needed to get the tool done and I never ended up needing that functionality. I'll consider adding a simple include/exclude filter, probably with wildcards. I will put this in issue #4 .