translate / translate-project

The Translate Project
4 stars 3 forks source link

Migrate from SVN to Git #2113

Closed julen closed 12 years ago

julen commented 12 years ago

Depends on:

(Selected 'Pootle' as a component because didn't have a better choice)

We want to migrate our current SVN infrastructure to Git for all the products: Translate Toolkit, Pootle, Virtaal, Amagama and any other project that's hosted in SVN.

This is a bug to track everything related to that move.

julen commented 12 years ago

Repository layouts

In SVN this is the current repository layout:

html/ (This might be for SF pages) src/ (all the source code is under this directory) branches/ tags/ trunk/ .cvsignore Pootle/ amagama/ corpuscatcher/ mtscripts/ newpaldo/ packaging/ pylint-hudson.ini pylint-ignore_files pylint-refactor.ini pylint-relaxed.ini run_pylint.sh setup.py setuppath spelling/ spelt/ tools/ translate/ virtaal/ wordlist/

branches, tags, and trunk for different projects are mixed all around.

In the new layout, we would have one repository for each project. Each repository would have its own master branch (trunk), plus other branches and tags.

Imported branches and tags

At import time, only Pootle, Virtaal and Translate Toolkit have branches and tags. We may not be interested on importing all of the existing tags and branches, though. I'll try to list them later.

Since they don't contain branches or tags, small projects such as amagama, spelt, or corpuscatcher are easy to import. I have made a successful import of amagama and it's available on my GH account: https://github.com/julen/amagama

Import times

Import times have to be taken into account as well; big projects may take hours to import. Amagama, being a small project, took less than 5 min for instance.

Authors mapping

Another thing we need to do before performing the final imports is properly mapping as much authors as we could on an AUTHORS file. In SF's SVN all users are listed with a SF email address, but most developers don't use this address out of SF so we'll need to adjust that. A call to contributors in the mailing list should be fine.

Ignored files

svn:ignore properties can be imported into a '.gitignore' file for each project. Ignored files can be also stored in the '.git/info/exclude' file, but tracking a '.gitignore' is good practice so we should go that way.

Git hosting

We need to decide were the new Git repo will be hosted. There are several available options, including GitHub, SourceForge, Gitorious, repo.or.cz...

Other bits

Depending on where we host the final repositories, other bits will need to be adjusted:

julen commented 12 years ago

Regarding the repository hosting, I would definitely go for GitHub. It is free, easy to setup, user-friendly, has a huge developer's user-base, and it makes easy for people to contribute. I'd say the potential contributors to the project are already there.

GitHub also offers managing an account as if it was an organization, so repositories are group-owned: https://github.com/blog/674-introducing-organizations

julen commented 12 years ago

Ohloh might need attention too.

julen commented 12 years ago

Have been running long import tasks during the last days on a Linode instance with no success.

The process eats all the system resources, and reaches a moment where it can't continue: Can't fork at /usr/share/perl5/Git.pm line 1262.

Googling I've found there's another svn2git tool from the KDE guys that serves for the same thing (so weird they have the same name!): http://gitorious.org/svn2git/svn2git/trees/master/src

It has the ability to define rules, so maybe we'll have the option to import only what's really required. In any case, it seems to do the job much better and faster than the git-svn wrapper, as noted in this blog post: http://blog.smartbear.com/post/10-12-16/migrating-from-subversion-to-git-lessons-learned/

Will report when further progress is made.

julen commented 12 years ago

Created attachment 822

Import script

This script runs the import procedure for all the products. In the cwd it needs to have three directories:

After running everything, there will be a repository in a separate folder for each app. They just need to be pushed somewhere to be public :)

julen commented 12 years ago

I forgot to say that this script relies on the KDE's svn2git tool.

julen commented 12 years ago

Oh, and there must be an 'authors-map' file in the script's cwd. That's the authors mapping file.

friedelwolff commented 12 years ago

Julen, I'm guessing that the rules might contain important detail, or am I wrong? Do I understand correctly that no branches or tags are imported with your current setup?

julen commented 12 years ago

(In reply to BZ-IMPORT::comment #8)

Julen, I'm guessing that the rules might contain important detail, or am I wrong? Do I understand correctly that no branches or tags are imported with your current setup?

Branches and tags are imported. Please read the details for each application and its rules files.

dwaynebailey commented 12 years ago

Choice of hosting is a decision that needs to be made. I know there are various options but it does seem like we're really only talking own, github, sf. What are the pro's and con's of each?

friedelwolff commented 12 years ago

Maybe I'm missing it, but where is the rules files? Attachment 822 is only the bash script.

friedelwolff commented 12 years ago

(In reply to BZ-IMPORT::comment #10)

Choice of hosting is a decision that needs to be made. I know there are various options but it does seem like we're really only talking own, github, sf. What are the pro's and con's of each?

In my mind I only really had SourceForge and github. SourceForge is where almost everything (except Bugzilla) is. Github has a nicer UI, as I understand from Julen (I haven't used it really).

dwaynebailey commented 12 years ago

(In reply to BZ-IMPORT::comment #11)

Maybe I'm missing it, but where is the rules files? Attachment 822 [details] is only the bash script.

The rules files are in each of the product specific sub-bugs. Julen made one bug per task.

dwaynebailey commented 12 years ago

(In reply to BZ-IMPORT::comment #12)

(In reply to BZ-IMPORT::comment #10)

Choice of hosting is a decision that needs to be made. I know there are various options but it does seem like we're really only talking own, github, sf. What are the pro's and con's of each?

In my mind I only really had SourceForge and github. SourceForge is where almost everything (except Bugzilla) is. Github has a nicer UI, as I understand from Julen (I haven't used it really).

If it makes anything easier, I'm erring on github because of the public private repo ability, teams, etc. While SF is just plain git.

julen commented 12 years ago

I'm in favor of GH as well. It's not just the UI (which is nice, too), there are more things that I highlighted in BZ-IMPORT::comment #2.

The thing that I value the most is the visibility that would gain the project, the developer-base and the easiness to contribute patches through pull requests.

dwaynebailey commented 12 years ago

Just reviewing our SVN and noted that some things aren't being migrated. I think we must migrate everything so that we can retire SVN.

Below is a list with comment and my opinions:

julen commented 12 years ago

(In reply to BZ-IMPORT::comment #16)

Just reviewing our SVN and noted that some things aren't being migrated. I think we must migrate everything so that we can retire SVN.

Below is a list with comment and my opinions:

  • mtscripts - this was for Translate's work on Moses. Migrate
  • packaging - various spec files and service scripts. Ignore
  • newpaldo - don't think this will move any further. Ignore?
  • wordlist - tools for dictionaries. Migrate with ZAF?
  • pylint* - this was mostly for Hudson. Migrate since I think hudson stuff should be tracked in VC

Alright, but the items listed as 'migrate', do they need to be migrated as separate projects or as part of other projects? If so, as part of which?

dwaynebailey commented 12 years ago

(In reply to BZ-IMPORT::comment #17)

(In reply to BZ-IMPORT::comment #16)

Just reviewing our SVN and noted that some things aren't being migrated. I think we must migrate everything so that we can retire SVN.

Below is a list with comment and my opinions:

  • mtscripts - this was for Translate's work on Moses. Migrate
  • packaging - various spec files and service scripts. Ignore
  • newpaldo - don't think this will move any further. Ignore?
  • wordlist - tools for dictionaries. Migrate with ZAF?
  • pylint* - this was mostly for Hudson. Migrate since I think hudson stuff should be tracked in VC

Alright, but the items listed as 'migrate', do they need to be migrated as separate projects or as part of other projects? If so, as part of which?

mtscripts - separate wordlist - this would mean migration of the ZAF translations which won't happen until we have Git support in Pootle. pylint* - Friedel where do you want your CI stuff?

julen commented 12 years ago

Created attachment 827

Import script

Added 'mtscripts'.

julen commented 12 years ago

As per comments #14 and #15, could we say we'll host our repos in GH?

julen commented 12 years ago

Once we migrate, we'll also need to shut down SF SVN repo access by:

dwaynebailey commented 12 years ago

(In reply to BZ-IMPORT::comment #20)

As per comments #14 and #15, could we say we'll host our repos in GH?

Yes. Friedel?

(In reply to BZ-IMPORT::comment #21)

Once we migrate, we'll also need to shut down SF SVN repo access by:

  • Going to the SF's Project Admin → Features
  • Selecting Manage next to Subversion
  • Choose Non-SF.net resource and provide a text and an URL to the new repos

Can we make the existing one available read-only? What will happen if you try to checkin into SF Svn?

julen commented 12 years ago

(In reply to BZ-IMPORT::comment #22)

Can we make the existing one available read-only? What will happen if you try to checkin into SF Svn?

A pre-commit hook script that exits with non-zero status would work, but I'm afraid we can't add our own scripts in SF's subversion repository.

Another way of making the repo read-only would be by revoking write permissions from all users. Cumbersome, but would work.

In any case, I would point developers to the new GH repo, otherwise they might think the code is unmaintained or the project is dead.

julen commented 12 years ago

All repositories have been successfully migrated: https://github.com/translate/