Closed mspncp closed 4 years ago
Place the show-issue
script into an empty directory and make it executable, then run it. The script can be executed several times in which case it will reuse the existing SVN repository, unless you specify the -r
(recreate) option.
show-issue
#!/bin/bash
set -o errexit
# use -r option to recreate the issue
if [ "$1" == "-r" ]; then
rm -rf issue
fi
mkdir -p issue
cd issue
svn_url="file://$(readlink -f example.svn)"
if [ ! -e example.svn ]; then
svnadmin create example.svn
# COMMIT 1 - Create ttb structure
svn mkdir $svn_url/trunk $svn_url/tags $svn_url/branches -m "Create ttb structure"
svn co $svn_url/trunk example
cd example
mkdir src other
for f in foo bar baz; do
echo $f > $f.txt
echo $f > src/$f.c
echo $f > other/$f.py
done
# COMMIT 2 - Add my files
svn add *.txt
svn add src
svn ci -m "Add my files"
# COMMIT 3 - Add other files
svn add other
svn ci -m "Add other files"
# COMMIT 4 - Create feature branch
svn cp $svn_url/trunk $svn_url/branches/feature -m "Create feature branch"
# COMMIT 5 - Add some stuff to my files
echo "some stuff" >> bar.txt
echo "some stuff" >> src/bar.c
svn ci -m "Add some stuff to my files"
# COMMIT 7 - Add some stuff to other files
echo "some stuff" >> other/bar.py
svn ci -m "Add some stuff to other files"
svn switch ^/branches/feature
# COMMIT 8 - Add more stuff to my files
echo "more stuff" >> bar.txt
echo "more stuff" >> src/bar.c
svn ci -m "Add more stuff to my files"
# COMMIT 10 - Add some stuff to other files
echo "more stuff" >> other/bar.py
svn ci -m "Add more stuff to other files"
cd ..
fi
cat > broken.rules <<EOF
create repository broken/main.git
end repository
create repository broken/other.git
end repository
match /trunk/other/
repository broken/other.git
branch master
end match
match /trunk/
repository broken/main.git
branch master
end match
match /branches/([^/]+)/other/
repository broken/other.git
branch \1
end match
match /branches/([^/]+)/
repository broken/main.git
branch \1
end match
EOF
cat > main.rules <<EOF
create repository fixed/main.git
end repository
match /trunk/other/
action ignore
end match
match /trunk/
repository fixed/main.git
branch master
end match
match /branches/([^/]+)/other/
action ignore
end match
match /branches/([^/]+)/
repository fixed/main.git
branch \1
end match
EOF
cat > other.rules <<EOF
create repository fixed/other.git
end repository
match /trunk/other/
repository fixed/other.git
branch master
end match
match /branches/([^/]+)/$
action recurse
end match
match /branches/([^/]+)/other/
repository fixed/other.git
branch \1
end match
match /
action ignore
end match
EOF
mkdir -p broken
(
set -o xtrace
svn-all-fast-export --debug-rules --add-metadata --rules broken.rules example.svn
) |& tee broken.log
mkdir -p fixed
(
set -o xtrace
svn-all-fast-export --debug-rules --add-metadata --rules main.rules,other.rules example.svn
) |& tee fixed.log
echo
echo "##"
echo "## example.svn"
echo "##"
svn log -v $svn_url > example.log
for solution in broken fixed; do
(
for repo in main other; do
r=$solution/$repo.git
echo
echo "##"
echo "## $r"
echo "##"
git -C $r log --all --graph --stat master feature
for b in master feature; do
echo
echo "tree of $b branch:"
git -C $r ls-tree -r $b
done
done
) |& tee -a $solution.log
done
I renamed the title in order to emphasize that I think it's a missing svn2git feature which causes the problems I have during my SVN to Git migration.
I really like that you provided a great reproduction recipe with the script, so I think you should get a good answer.
Thank god I have one. :-)
You were already halfway at your goal using the recurse
action.
You just didn't consider the prefix
action. :-)
In your broken rules, replace
match /branches/([^/]+)/other/
repository broken/other.git
branch \1
end match
match /branches/([^/]+)/
repository broken/main.git
branch \1
end match
by
match /branches/([^/]+)/$
action recurse
end match
match /branches/([^/]+)/other/
repository broken/other.git
branch \1
end match
match /branches/([^/]+)/([^/]+)
repository broken/main.git
branch \1
prefix \2
end match
and you should be good to go :-)
Thank you for taking the time to take a look at my problem. And thanks for the compliment about the reproducer. For me it was clear from the beginning that I would need to provide a good one, otherwise no one would ever bother to dig into my problem. ;-)
I didn't try your suggestion yet, but I was wondering about the missing trailing slash at the end of your last regex
match /branches/([^/]+)/([^/]+)
repository broken/main.git
branch \1
prefix \2
end match
I did actually think about similar rules, but I thought it was not allowed to omit the final slash in the regular expression if you didn't want to risk crashing svn2git? In that restriction still holds, what can I do if, say ^/branches/release
not only contains subdirectories but also regular files, which need to be separated?
That's a slight miscommunication / urban legend.
svn2git has no problem with having no final slash, otherwise you couldn't match single files.
But you have to make sure to not send paths starting with slashes to the git
import.
Let me explain with an example.
Having this svn history:
------------------------------------------------------------------------
r3 | bkautler | 2020-07-13 02:15:37 +0200 (Mon, 13 Jul 2020) | 1 line
Changed paths:
A /project-b/dir-a (from /project-a/dir-a:1)
copy project-a/dir-a to project-b
------------------------------------------------------------------------
r2 | bkautler | 2020-07-13 02:15:36 +0200 (Mon, 13 Jul 2020) | 1 line
Changed paths:
A /project-b
add project-b
------------------------------------------------------------------------
r1 | bkautler | 2020-07-13 02:15:36 +0200 (Mon, 13 Jul 2020) | 1 line
Changed paths:
A /project-a
A /project-a/dir-a
A /project-a/dir-a/file-a
A /project-a/dir-a/file-b
add project-a/dir-a/file-a and project-a/dir-a/file-b
------------------------------------------------------------------------
If you use these rules:
create repository git-repo
end repository
match /project-a/dir-a
repository git-repo
branch master
end match
match /
end match
you instantly get a "fast-import crash report"
with this problematic line: * M 100644 :4294967294 /file-a
as it has not path before the slash
If you instead use
create repository git-repo
end repository
match /project-a/dir-a
repository git-repo
branch master
prefix foo
end match
match /
end match
there is no problem and you get foo/file-a
and foo/file-b
in the Git repo.
This is also no problem:
create repository git-repo
end repository
match /project-a/dir-a/(file-a)
repository git-repo
branch master
prefix \1
end match
match /
end match
You get a file-a
in the root directory of the repository.
I guess you get the idea now?
With
match /branches/([^/]+)/([^/]+)
repository broken/main.git
branch \1
prefix \2
end match
If it matches a file, \2 is the name of the file, without a slash anywhere and nothing follows the actually matched part, so having prefix \2
, "nothing" is prefixed with the filename.
If it matches a directory, after the actually matched part follows a slash (and maybe more directories or files, but any way a slash) which would be problematic, but \2 contains the directory name and by having prefix \2
it prefixes the leading slash with the actual directory name.
The only problematic case is, if your rule matches a directory, does not match the following slash and does not provide any prefix.
If you feel better you could probably instead do
match /branches/([^/]+)/([^/]+/?)
repository broken/main.git
branch \1
prefix \2
end match
as this will then match the slash if there is one and also transport it in the prefix, but imho this is an unnecessary complexity that just reduces readability as it is totally unnecessary.
Thank you once more for the detailed explanation. Currently, I'm occupied if other things, but I will certainly do some experiments following your suggestion, since our svn-to-git migration has not been finalized yet. I let you know, if something interesting comes out of it.
Thanks also for sharing this incredibly helpful tool :-)
I didn't do, I'm basically just a user like you, I just was bugged by some bugs when I did a really huge conversion and made some fixes to problems I hit, as I needed them and the maintainer only merges PRs nowadays. C++ and Qt are not even nearly my native programming language. :-D
Well, then thanks for sharing your bugfixes. ;-)
FYI @vampire: your suggestion works like a charm, at least in my toy example. I adopted my reproducer, renamed my 'fixed' solution to 'msp', and added yours as 'vampire'. For the full details, see my mspncp/svn2git-issue-109 repository. The most significant change is that your solution adds a dedicated commit for the creation of the feature branch.
svn2git-issue-109/issue.sample$ diff msp.log vampire.log
--- msp.log 2020-07-24 13:41:52.265067945 +0200
+++ vampire.log 2020-07-24 13:41:52.276068064 +0200
@@ -1,18 +1,26 @@
##
-## msp/main.git
+## vampire/main.git
##
-* commit 9eec619c74fd70916570b766cb019b0ec2207891
+* commit 97309974ad379753686cfd1e7914092dc3a699d1
| Author: msp <msp@localhost>
| Date: Fri Jul 24 11:41:51 2020 +0000
|
| Add more stuff to my files
|
-| svn path=/branches/feature/; revision=7
+| svn path=/branches/feature/bar.txt; revision=7
|
| bar.txt | 1 +
| src/bar.c | 1 +
| 2 files changed, 2 insertions(+)
+|
+* commit f3600b8b5185bdaf2b9991049e4456eda3361a3d
+| Author: msp <msp@localhost>
+| Date: Fri Jul 24 11:41:50 2020 +0000
+|
+| Create feature branch
+|
+| svn path=/branches/feature/bar.txt; revision=4
|
| * commit 7876d86a1bb2ff08b19f8f6230e34940d5aae2d5
|/ Author: msp <msp@localhost>
@@ -59,7 +67,7 @@
100644 blob 257cc5642cb1a054f08cc83f2d943e56fd3ebe99 src/foo.c
##
-## msp/other.git
+## vampire/other.git
##
* commit 18ba918b1bfd1e6602c99a3deb8728d2016f8335
| Author: msp <msp@localhost>
Looks promising. I'll try your suggestion with my real world example as soon as time permits.
I am currently migrating a large SVN repository to Git which has a monolithic TTB structure (meaning: no subprojects, all branches are copies of the entire
^/trunk
). During the conversion, most of the repository should go into a single large Git repository, except for one directory, which should move to a separate repository. I had some problems doing the conversion which I would like to describe first before asking some questions. To demonstrate my problem, I prepared a simplified toy example.Attached below the issue you will find a bash script (
show-issue
) to reproduce the toy example from scratch.The SVN repository
The SVN repository (
example.svn
) has two branches, the trunk (^/trunk
) and a feature branch (^/branches/feature
). Both branches have the same directory structure:As one can see from the following history, the feature branch was branched off the trunk in r4 and both branches received additional commits.
First migration attempt (broken)
All files from the
example.svn
repository should go into a single large Git repositorymain.git
, except for the contents of theother
directory, which go toother.git
. This here was my first attempt:broken.rules
The Outcome
broken.log
The content of the
main.git
repository looks ok:But the feature branch of the
other.git
repository is 'detached' from themaster
branch and its tree is incomplete, i.e., it contains only the single file (bar.py
) which was modified after branching (in r8).The Problem
I learned from UsingSvn2Git that the rule
won't be matched in r4 (
A /branches/feature (from /trunk:3)
), unless I define a recurse actionHowever, I also need an export action for the main repository
and unfortunately
svn-all-fast-export
does not support an 'export-and-recurse' action.Second migration attempt
The only solution I could come up with to fix that conflict was to create two separate rule files:
main.rules
other.rules
MultiRules!!
Initially, the downside of this approach was that I had to call
svn-all-fast-export
twice, which doubled the conversion time. But then I discovered (from the source code) a handy feature, namely that you can specify a comma separated list of rules on the command line.The two rule files are not merged, but instead executed independently in parallel. This feature was added in commit a741bdb1913c by @tnyblom. This is an excellent feature IMO and unfortunately poorly documented.
The Outcome
This time, the result looks good:
fixed.log
Questions
Having to maintain two different rule files makes things a little bit complicated (in particular, since my real repository is not as simple as my example) and I was asking myself whether there was a different way to solve my problem using a single rules file, which I might have missed?
If not, would it be a great effort @tnyblom to add a new 'export-and-recurse' action and would this make it possible to have all rules in a single file (as in the broken attempt) using something like the following?