sagemath / trac-to-github

Script to migrate Trac tickets to GitHub issues and the Trac wiki to markdown. Input: https://trac.sagemath.org/ ➠ Intermediate: https://github.com/sagemath/trac_to_gh ➠ Output: https://github.com/sagemath/sage/issues
https://trac.sagemath.org/ticket/30363
7 stars 5 forks source link

When no GitHub access token is given, write out json files in migration archive format instead #11

Closed mkoeppe closed 1 year ago

mkoeppe commented 1 year ago

as described in https://gist.github.com/jfine/dbd5e97d708adbba027660adfeb62771, https://github.github.com/enterprise-migrations/#/./2.1-export-archive-format

mkoeppe commented 1 year ago

The migration script uses https://github.com/PyGithub/PyGithub; I think we can create these objects such as https://github.com/PyGithub/PyGithub/blob/master/github/IssueComment.py also offline and then serialize them to json. Each object becomes a separate numbered file.

mkoeppe commented 1 year ago

@dimpase

dimpase commented 1 year ago

I'm confused - I thought what GitHub suggests is a completely different migration process. To recap, quoting abbycabs@github.com:


No costs or enterprise account needed, but I may have to flag your account in for some of our tooling.

Our important tool (ECI) did the Python migration. If you're able to create an export that generates a similar archive from Trac as they did with bpo, you should be able to mimic the behaviour with 'mannequins'. We don't have a Trac exporter, so if you're interested in using our importer you'll have to write one. As long as the archive file you generate is compatible with the importers it will be imported.

Here's the info I was able to get from our migrations team:

GitHub’s migration tooling currently utilize an import/export archive approach where exporters generate archives and importers consume them. Although the interface is different on cloud vs server both consume and generate the same archive format. One thing to note is that when importing an archive on server (https://docs.github.com/en/enterprise-server@3.0/admin/user-management/preparing-to-migrate-data-to-your-enterprise) the logs (and errors) are directly accessible which is extremely helpful for debugging when building exporters. Because of this you may find iteration much quicker by first ensuring the archive can be successfully imported to server (https://enterprise.github.com/trial) before testing on cloud.

One of the challenging aspects of creating an exporter is that there isn’t a public/formal spec, there is however an older/informal (https://gist.github.com/jfine/dbd5e97d708adbba027660adfeb62771) one which will still be helpful in getting a broad understanding of how the archive is structured. Another approach that is helpful is to generate an example archive can be from a test or existing repository on server (https://docs.github.com/en/enterprise-server@3.0/admin/user-management/exporting-migration-data-from-your-enterprise) or on cloud (https://docs.github.com/en/enterprise-server@3.0/admin/user-management/exporting-migration-data-from-githubcom) and view it’s contents.

You might find it helpful to see the source for other exporters. Neither are openly available but we informally distribute them during service engagements via a shared Google Drive. The ones that are available are Bitbucket Server Exporter (https://drive.google.com/drive/folders/1Oizu_sJ_snssOFgmyozyEn-w9YFVezhJ?usp=sharing) and GitLab Exporter (https://drive.google.com/drive/folders/15K-FD7sKq0yMabFOMbj2y8IOFHBhxZRx?usp=sharing).

When you’re ready to test importing to GitHub.com (https://github.github.com/enterprise-migrations/#/./3.0-import-diagram) there is an early access tool called Cloud Importer. Since it is early access there are some prerequisites (https://github.github.com/enterprise-migrations/#/./0.3.1.6-tools-prerequisites-import-ghec). Once Cloud Importer is enabled it can be accessed directly via GraphQL (https://github.github.com/enterprise-migrations/#/3.1.2-import-using-graphql-api) or via Enterprise Cloud Importer (https://github.github.com/enterprise-migrations/#/./3.1.1-import-from-archive).

mkoeppe commented 1 year ago

The migration archive that we are asked to create consists of json objects, one per file, for example https://gist.github.com/jfine/dbd5e97d708adbba027660adfeb62771#issue-comment

mkoeppe commented 1 year ago

Instead of just printf in this format, I'm proposing to create PyGithub objects and then write them out in json format.

dimpase commented 1 year ago

That's certainly the way to go.

mkoeppe commented 1 year ago

Happening in #14

mkoeppe commented 1 year ago

Official info on the format: https://github.github.com/enterprise-migrations/#/./2.1-export-archive-format

mkoeppe commented 1 year ago

Next step: Try out import using an GitHub Enterprise Server instance - https://groups.google.com/g/sage-devel/c/XjDvmuBsHmo/m/X5p2m99HDAAJ