psf / gh-migration

This repo is used to manage the migration from bugs.python.org to GitHub.
42 stars 8 forks source link

Migration and risk management plans #13

Open ezio-melotti opened 2 years ago

ezio-melotti commented 2 years ago

This issue describes the migration plan, testing strategy, execution plan, and risk management plan. This list of steps is not final, new steps might be added, the time estimates should be more accurate, and each step should be assigned to someone. This plan overrides PEP-588, and might eventually be turned into a PEP. For the time being is kept here for convenience.

This document uses the following terms:

Migration plan

These are the steps required to migrate issues from bpo to GitHub:

  1. Inform the users about the migration (~2w)
  2. Start the migration by making bpo read-only
  3. Export all issues from bpo (<1h -- ~22m without attachments)
  4. Import issues in a new repo through the ECI (~\~25h~ ~12h *)
  5. Enable the issues tab on the cpython repo
  6. Transfer issues to the cpython repo (~\~4-7d~ ~20h **)
  7. Possibly setup and run post-migration actions
  8. Test everything and remove the issue template from the cpython repo
  9. Inform the users that the migration happened

* ~Importing 500 issues (without attachments) on a Friday morning (Europe)/Thursday night (US) took 13m. We currently have almost 60k issues, so it should take around 25h. Earlier imports took about half of this time though, so it might depend on the server load.~ Further testing showed that it takes about 12h.

** The transfer has been optimized, and it now takes about 20h.

Testing strategy

Each step of the previous list should be tested (if possible):

  1. :heavy_check_mark: Informing users is tested by telling them and see their reaction.
  2. :heavy_check_mark: Should be tested on a local instance of bpo. The test should verify that it's not possible to create new issues nor editing existing ones (this includes both changing fields and adding new comments). Issue redirects can also be tested and enabled before the migration starts.
  3. :heavy_check_mark: This has been tested several times already, but a full test export should be performed shortly before the actual migration.
  4. :heavy_check_mark: Like 3. this has also been tested and should be tested with a full import before the actual migration.
  5. :heavy_check_mark: The issue template config has been tested on a separate repo and on python/cpython.
  6. :heavy_check_mark: We already performed a test import with a subset of the issues (~500). We will perform more tests using small subsets until all the issues are ironed out, and we should perform a full test import before doing the actual migration.
  7. :heavy_check_mark: ~GitHub Actions (e.g. updating issue references) can be tested on separate repos, and possibly added to the source tree before the migration starts.~ we currently don't have any additional actions.
  8. :heavy_check_mark: This is just a matter of merging a PR that removes the issue template config file. (python/cpython#32106)
  9. :heavy_check_mark: This doesn't require testing for emails/social media, but it does for #12.

Execution plan

If all goes well, these are the actions that we will take:

  1. Users should be informed through different means, including but not limited to mails to python-dev/python-commiters, posts on Discourse, blog posts and other social media, and a banner on bpo.
  2. [Fri 25, evening] When the migration starts, the PR that makes bpo read-only will be merged and tested. The PR should also include a banner for bpo to explain users that the migration is in progress.
    • [x] Merge psf/bpo-tracker-cpython#16(@ezio-melotti)
    • [x] Disable python/cpython -> bpo webhook (@ezio-melotti)
    • [x] Test that bpo is read-only (@ezio-melotti, @ambv)
  3. [Fri 25, evening] After the PR has been merged and deployed, and after verifying that bpo is read-only, the export tool will be used to produce a zip file.
    • [x] use the export tool to create the zip (@ezio-melotti)
  4. [Fri 25, evening] The zip file will be then fed into the ECI. Given the amount of issues, the ECI might timeout and must be monitored to ensure that the import completes successfully. This will result in a new and separate repo that will include all the bpo issues.
    • [x] Import the archive into the ECI (@ezio-melotti)
    • [x] ~Start a backup import \~4h in (@ezio-melotti)~
      • GitHub says it will only increase the load and make the first import slower
    • [x] Save the migration ID/GUID of the import (@ezio-melotti)
    • [x] Get the name of the on-call GitHub engineer (@ezio-melotti)
    • [x] Monitor the import overnight until it's complete (@ezio-melotti, GitHub team)
      • If the import gives an error, use the "Retry" button to resume
      • If it gets stuck without errors, ping GitHub
  5. [Sat 26, morning] At this point, we can enable the issues tab, with the issue template config already in place.
    • [x] Enable the issues tab (@ezio-melotti)
  6. [Sat 26, morning] After everything is ready, we will inform GitHub. They will then start the issue transfer. This will need to be monitored in case of errors.
    • [x] Inform the GitHub team (@ezio-melotti)
    • [x] Start the transfer and monitor it until it's complete (GitHub team, @ezio-melotti, @ambv)
  7. [Sun 27, morning] Once the transfer is complete, we might need to run some post-migration actions (e.g. to update issue references). We will also manually run some of the other installed actions to make sure they work properly. Note that some actions might need to be tested after the next step. (@ambv, @ezio-melotti)
    • [x] Retrieve issue mapping from the GitHub team (@ezio-melotti)
    • [x] Update the github field of all issues on bpo (@ezio-melotti)
    • [x] Merge psf/bpo-tracker-cpython#17 (@ezio-melotti)
    • [x] Update bpo-* autolinking on python/cpython (@ezio-melotti)
    • [x] TBD (@ambv, @ezio-melotti)
  8. [Sun 27, morning] Once all the issues have been transferred and tested, the issue template config will be removed by the cpython repo, allowing users to create new issues.
    • [x] python/cpython#32106 (@ezio-melotti)
  9. [Sun 27, afternoon] Pre-written messages will be sent out on MLs and social media to inform the users. The script required for #12 could be run now or later. Additional actions (e.g. weekly summary) could also be installed later.
    • [x] Update bpo banner: psf/bpo-tracker-cpython#12 (@ezio-melotti)
    • [x] Post a Discourse announcement (@ezio-melotti)
    • [x] Post a python-dev announcement (@ezio-melotti)
    • [x] Merge the devguide update PR (python/devguide#814) (@ambv, @ezio-melotti)
    • [x] Merge the docs.python.org issue links PR (python/cpython#32342) (@ezio-melotti)
    • [x] Remove the weekly summary cronjob on bpo (python/psf-salt#234) (@ezio-melotti)
    • [x] Remove irker on bpo (python/psf-salt#232) (@ezio-melotti)
    • [ ] TBD

There are also a number of related changes that should be done:

After the migration, and once we have the bpo->GH mapping, we could:

These changes affect the "Last update" datetime, so we could do them lazily through a GitHub action whenever someone edits an existing issue.

Risk management plan

This section discusses the failures we might encounter during each step of the migration and suggest ways to prevent them and deal with them. None of these things are expected to happen, but we should have a plan B just in case.

  1. Once we inform the users:

    • They might protest, but at this point the migration is going to happen, so the best we can do is addressing their feedback to the best of our ability.
  2. When we make bpo read-only:

    • If we fail to make bpo read-only, the migration will be delayed until we verified that is not possible to create/edit issues. This should also be tested on a local copy of the tracker beforehand.
    • If we make bpo read-only, but people (or bots) somehow manage to create a few issues and/or messages some other way, we could just inform them and ask them to recreate them on GitHub once the migration is done (if it's just bot messages we could even ignore them).
  3. Exporting issues from bpo:

    • This is easy to test but if somehow a new/recent issue/message breaks the exporter, I could try to identify and fix the problem on the fly, causing a small delay. If the issue is too complex to fix quickly, we might reopen bpo and reschedule the migration.
    • We highly depend on devguide documentation to ease transition from bpo to Github Issues for users unfamiliar with Github issues.
  4. Import issues in the ECI:

    • This is also easy to test, but time-consuming. We could also import the archive twice at the same time, so that if an import fails the other might succeed. If they both succeed we will also have a backup repo in case something goes wrong during the transfer.
    • If the import timeouts (as it often happens with big archives), a "Retry" button appears that will generally make the import resume. The timeouts also report a code and the migration id, and these can be used by GItHub to investigate the issue.
    • If the import fails because of a problem with the archive, either the problem should be fixed by opening and editing the archive manually, or by fixing the exporting tool and exporting a new archive. A full test import before the migration should help mitigate this risk.
    • If the import fails because of a problem with the ECI and can't be resumed, we will have to restart the import.
    • If the PC performing the import crashes or in case of blackout, it won't be possible to hit "Retry" from the ECI, but we could use the migration IDs to resume and complete the migration. The migration IDs should be saved beforehand. If this happens soon after the migration starts, it might be better to restart it from the ECI.
  5. Possibly partially lock the cpython repo:

    • Once we decided if/how to do this we should be able to test it on a separate repo, so it shouldn't fail as long as we document the steps and follow them
    • If locking doesn't work and people are somehow able to create issues, this will interfere with the numbering but I guess we will have to live with it (the numbering is changed anyway). As long as we advertise somehow that the migration is happening and users shouldn't create/edit issues, I think it's ok if those issues get lost.
  6. Transfer issues to the cpython repo:

    • This is handled entirely by GitHub team, so we have little control over this. It seems they have a certain degree of control, and they can transfer in batches and/or resume/retry the transfer. Doing a full test transfer will ensure that there no issues with problematic fields.
    • If an issue can't be transferred, it might be possible to edit the source issue and try again. If the import stops at the first failure, we might be able to preserve the ID ordering, if not, it could also be transferred again at the end or even after the migration.
    • Transferring deletes issues from the source repo, so -- unless there is a way to preserve them -- if something goes wrong and the transfer needs to be performed again, the archive will need to be imported again. This could be done preemptively so that after exporting the bpo issues we import the archive twice in two separate repos.
  7. Possibly setup and run post-migration actions

    • This depends on the actual actions being executed.
    • Once the migration is completed successfully, every other non-critical action could be done afterward, and should only cause minor inconveniences.
  8. Unlock the cpython repo and test everything

    • If something went wrong, we could disable the issues tab and unlock the repo while we investigate. We might be able to fix the issue directly, or possibly we will have to lock it again for a short time to re-import a few issues. Worst case scenario we will have to wipe away all issues and redo the transfer from scratch. Having a script able to inspect/edit/remove one or more issues through the API (since if the issues tab is disabled we won't be able to do it from there) might be helpful.
  9. Inform the users that the migration happened

    • We should be able to address any concern that didn't arise before the migration after the migration is complete. Informing the users clearly, widely, and in advance will help ensure that people knows about the migration, about what is getting transferred, about the duration of the downtime, and other things. This should help minimize surprises and hostile reactions.
hugovk commented 2 years ago

This is now done.

hugovk commented 1 year ago
  • [ ] Install actions in .github/actions/ on python/cpython

Is there anything to be done for this or can we check it off?