svn-all-fast-export / svn2git

:octopus: A fast-import based converter for an svn repo to git repos
GNU General Public License v3.0
262 stars 100 forks source link

Please improve documentation of incremental conversions #91

Closed mspncp closed 2 years ago

mspncp commented 4 years ago

First of all, thank you for sharing this very flexible and highly configurable conversion tool.

I had however some nasty problems getting incremental conversions working, which I'd like to share because they were partially caused by a lack of documentation and partially by a design flaw:

Description of the problems

The --resume-from flag

The --resume-from flag is poorly documented. The svn2git --help output simply states

       --resume-from         start importing at svn revision number
       --max-rev             stop importing at svn revision number

which misguided me to think that the flags were meant to be used for incremental conversions. That is, if the previous svn2git run converted all revisions up to revision <n>, I thought that I needed to call svn2git --resume-from <n+1> ... for incrementally converting the newer revisions.

In the end, it turned out that the --resume-from isn't needed at all, but the necessary information is collected automatically from the log files (see below). The --resume-from flag seems to be meant to be used for discarding and rewriting some commits which were already committed. (I did not really check that in the source thoroughly.)

The importance of the log files

The log files (log-<repo>.git) play a crucial role for the incremental conversions. Unfortunately, this important detail is completely undocumented. What makes things worse is the fact that the log-<repo>.git is stored outside the Git repository, contrary to other important auxiliary files (branchNotes-<repo>.git, marks-<repo>.git).

.
├── <repo>.git
│   ├── HEAD
│   ├── branchNotes-<repo>.git
│   ├── branches
│   ├── config
│   ├── description
│   ├── hooks
│   ├── info
│   ├── marks-<repo>.git
│   ├── objects
│   └── refs
└── log-<repo>.git

Since the term "log file" suggests that the file just contains a transcript from a previous run, I failed to include them when I moved the converted Git repositories from my own machine to the production server (to run incremental conversions triggered by an SVN post-commit hook). The first run of the incremental update produced tons of warnings

Exporting revision <nnnn> ..WARN: Branch "master" in repository "<repo>.git" doesn't exist at revision <nnnn> -- did you resume from the wrong revision?

and left the Git repositories in a broken state. (A lot of files were missing from HEAD^{tree}).

Conclusion

In view of my experiences, I would like to suggest the following enhancements:

uqs commented 4 years ago

Good suggestions. Would you care to update the code and README to document the flags for example?

Also, when you have --debug-rules turned on, there is yet another logfile being produced outside the target directory.

While I have never used the converter in that fashion, it is possible to split a SVN repo into many different git repos, so I'm pretty sure that your third point isn't feasible.

mspncp commented 4 years ago

Good suggestions. Would you care to update the code and README to document the flags for example?

I can try, but it will take a few days. Also, all I can do is to document what I figured out myself, i.e., what I think the commands do.

Also, when you have --debug-rules turned on, there is yet another logfile being produced outside the target directory.

I'm not so much concerned about normal logfiles for troubleshooting. But the log-<repo>.git files are not ordinary logfiles, they contain important information which is needed for incremental conversions. Because I didn't know that and because they were outside the converted repositories, I failed to copy those files together with the repos onto our production server. This broke incremental updates and it took me quite a while to figure out what was wrong.

While I have never used the converter in that fashion, it is possible to split a SVN repo into many different git repos, so I'm pretty sure that your third point isn't feasible.

I think you are mistaken: There is not one log-* file per SVN repo, but one per Git Repo. If you have a run which creates three Git repositories repo{1,2,3}.git, then you will get three logfiles log-repo{1,2,3}.git, respectively. And those should go inside the repository, not next to it.

uqs commented 4 years ago

Thanks for the corrections.

Btw, incremental runs were working fine for me, but after upgrading to git 2.27 (from 2.24) the starting of the git-fast-import process eats up all RAM (I assume while processing marks) and dies. I know that version 2.27 has this behavior. I haven't bothered to bisect which exact git releases caused this regression.

Good luck!

mspncp commented 4 years ago

Thanks for the corrections. Btw, incremental runs were working fine for me, ...

Your welcome. After I noticed and corrected my initial mistakes (like using --resume-from) it works fine for me too. I mainly posted my experience here to help other users avoid the trap :-).

... but after upgrading to git 2.24 the starting of the git-fast-import process eats up all RAM

Thanks for the heads up. The git version on our SVN/Git converting machine has the version number 2.20.1 and runs just fine.

mspncp commented 4 years ago

I know that version 2.27 has this behavior. I haven't bothered to bisect which exact git releases caused this regression.

If you do try to investigate it and find out something interesting, I'd be happy to hear about it.

ymartin59 commented 3 years ago

Many thanks for details in this issue. My really big migration was stopped because of a mistake in rules but I was desperate that resume failed with "Aborted - Failed to write to process". The explanation was that I have upgraded git to Debian version 2.29.2-1~bpo10+1 to use "git-filter-repo" and it was exactly the reason why resume failed. I reverted back to previous 2.20.1-2+deb10u3 and then svn-fast-all-export resumes as expected. It saved my day.