jeffwidman / bitbucket-issue-migration

A small script for migrating repo issues from Bitbucket to GitHub
GNU General Public License v3.0
316 stars 95 forks source link

BB Issue IDs may not match Github Issue IDs if BB issues were deleted #42

Closed jeffwidman closed 7 years ago

jeffwidman commented 8 years ago

For example, issue 3 was deleted.

Now the 3rd issue created in Github is BB issue 4, and the IDs get out of sync.

Non-sync'd IDs breaks fix_links() conversion to relative GH urls.

Test it on the SQLAlchemy repo, as IDs and issue total count do not match, so guaranteed to hit this issue.

I'm not sure how to solve, as Github doesn't let us input the issue ID. Perhaps when it happens, create a placeholder issue in github just to keep the IDs synced. Be sure to auto-close this placeholder issue.

jeffwidman commented 8 years ago

This is a legit problem--example: https://bitbucket.org/zzzeek/sqlalchemy/issues/2479

jeffwidman commented 8 years ago

Added a test to catch this in 0dc1824cded9ce76677ea8f987baedcc073eee06

jeffwidman commented 8 years ago

As noted in https://github.com/jeffwidman/bitbucket-issue-migration/issues/70#issuecomment-176901831, can't just use track the offset here for fix_links() because of the following scenario:

  1. issue 1 created
  2. issue 2 created
  3. issue 3 created, and then issue 1 edited to include a reference to issue 3.
  4. issue 2 deleted.

If I just track an offset, when the link in issue 1 to issue 3 is converted, the way the script is currently written means it won't be aware of the offset created by issue 2 being deleted. Could do a mapping, but that feels a bit overblown.

So probably the most pragmatic solution is still to insert blank placeholder issue.

mithrandi commented 8 years ago

I just wanted to point out that it's possible for issues to be non-contiguous even if no issues were deleted; for example, when we originally migrated from Launchpad to BitBucket, we kept the Launchpad bug numbers for everything. Since Launchpad bug numbers are global (a single bug can be associated with multiple projects), our bug numbers start at a high number (#433605), and have lots and lots of holes (latest issue is #1228543).

jeffwidman commented 8 years ago

@mithrandi thanks, I was ignorant of that--definitely affects the proposed solution here. Likely I'll need to instead just make the link converter function optional, so that it can be disabled for these scenarios.

Did Bitbucket support specifying the issue ID when you imported over from Launchpad?

mithrandi commented 8 years ago

It's been a while since we did the migration to BitBucket (and I can't find the script we wrote to do it), but I believe it was done by preparing the data in a suitable format for the Import/Export feature, documented here. I don't think you can create an issue through the API with an arbitrary issue ID, but you can specify it in the import data.

By the looks of things, the GitHub issues import API does not support this, and creating hundreds of thousands of placeholder issues is obviously not a great idea. Perhaps they would add it if we ask nicely? On the other hand, at least in our particular case, we probably don't care too much about keeping the issue IDs, we mainly did it in the LP→BB migration because we could, not because it was important. (If it wasn't clear, we're looking to migrate from BB→GH now).

mithrandi commented 8 years ago

Aha, found it the script: https://bitbucket.org/jonathanj/lp2bb

jeffwidman commented 8 years ago

@mithrandi the GitHub issue import API uses a task queue to process/import tasks, they're not immediately created. So I doubt they'd add support for specifying IDs, as it'd likely require changing their whole infrastructure underlying issue importing.

Given the wide range of holes for issue IDs from Launchpad, sounds like actually best to create a mapping of issue IDs in BB to expected ID in GitHub. This mapping should also account for existing issues in the GH destination repo (#70). Then run the import (and hope that nobody files a new issue during the import process... probably add a note to the docs encouraging folks who really care to start with blank repo, and only migrate the code if the issues migrate successfully, otherwise to redo the issue import).

This will require fetching all issues & issue comments from BB before starting to import anything into GitHub so that they can all be processed in one go once the mapping is finished. Currently comments are fetched as an issue is imported.

Also could make fix_links() controlled by an optional flag on teh command line, defaulting to enabled, but that way people disable if desired. Personally, I think it's extremely convenient to run fix_links() for updating crosslinks between issues.

mithrandi commented 8 years ago

You should probably include the original BB ID somewhere in the migrated issue at least, since this may appear elsewhere (eg. in VCS commits that were migrated).

jaraco commented 8 years ago

I encountered this issue today attempting to migrate CherryPy with a missing issue.

thomasjsn commented 8 years ago

@jaraco Thanks for your work on filling gaps, work beautifully :)