psf / gh-migration

This repo is used to manage the migration from bugs.python.org to GitHub.
42 stars 8 forks source link

Write a tool to export data from bpo #4

Closed ezio-melotti closed 2 years ago

ezio-melotti commented 3 years ago

In order to import data into GitHub we need to export bpo data in a format compatible with the importer tool.

There are at least 5 ways to do this:

  1. Using the Roundup Python API to directly access the db (see below);
  2. Using roundup-admin to export the data and then parsing the output;
  3. Using the REST API;
  4. Using the XMLRPC interface;
  5. Accessing the PostreSQL DB directly.

The first option is likely the easiest solution. The script that generates the weekly "Summary of Python tracker issue" does something similar to access the database and extract data about the issues. The Roundup documentation has a table that summarizes the available functions.

By using one of these solutions, we can write a tool that extracts the data from bpo and rearranges them in the right format. The tool will also need reformat the issues (see #3), rearrange the labels, and possibly make other changes. The first version of the tool doesn't need to include these changes -- they can be added once we solved the other issues.

We should also take care of exporting attachments such as patches, sample scripts, screenshots, etc..

Update (2021-09-16) I'm writing a tool using the first option above:

ezio-melotti commented 2 years ago

This is now done: all relevant items are exported. For files/attachments, we decided to keep hosting them on bpo, and simply add a direct link to them.

Some issue fields still need some tweaking (see #5). Some messages still need link/ref rewriting (see #3).