jeffwidman / bitbucket-issue-migration

A small script for migrating repo issues from Bitbucket to GitHub
GNU General Public License v3.0
316 stars 95 forks source link

Issues can be created out of order #45

Closed jaraco closed 8 years ago

jaraco commented 8 years ago

I'm currently attempting to migrate https://bitbucket.org/jaraco/hgtools. I'm using the jaraco/bitbucket-issue-migration#refactor-and-python3 to get the keyring support, but the technique for importing the issues into github should be the same.

When I run the migration, I get different results for different runs. In a recent test, I ran:

python migrate.py jaraco hgtools jaraco jaraco/hgtoolstest

Which yielded lines including:

Created bitbucket issue 27 (27/31): hgtools is a weird name [1 comments]
...
Created bitbucket issue 31 (31/31): Automatic versioning breaks if mercurial repository is synced with a GIT upstream through hg-git extension [0 comments]
Created 31 issues

However, despite the script having created the issues in order, that issue 27 appears as issue 28. Looks like issue 31 was processed as 25.

But the behavior is not deterministic. In another run, there were more and different discrepancies.

It seems there may be a race condition in the Github issue importer such that it can get some issues out of order. I'm not yet sure what can or should be done.

jaraco commented 8 years ago

Adding a time.sleep(1) between each issue upload seems to have been sufficient to work around the issue for this repository in today's migration. At least, I was able to complete a successful migration and a spot check indicates the issues were created in order.

Ideally, the import API should provide a mechanism to block until the issue is confirmed to be committed. Either that, or the importer should do a double-check to confirm the issue is committed before proceeding to the next.

jeffwidman commented 8 years ago

Thanks for reporting.

Good call on checking via multiple runs, I'm surprised you got non-deterministic results.

time.sleep() is probably the right temporary hack, but I agree it'd be nicer if we just doublecheck the import happened. Although i'm not sure what we do in that case, probably if the doublecheck fails, just sleep and recheck.

I also completely agree that the best solution here would be a solution on Github's side. I'll file a support ticket with them and see what they say.

Another issue that may cause non-correlated ID numbers is #42 - planning to fix that in the next week or two.

jeffwidman commented 8 years ago

GitHub's detailed reply:

The behavior you observed is expected -- if you use the issues import API, and I submit two imports for issues A and B, and B is submitted after A, but before A's import completes, then it is possible that B will have a lower issue number than A.

It's possible that the processing of A takes a long time (because it has lots of comments, for example), so it completes after B even though it was started before B, and for that reason A gets an higher issue number than B. Reserving issue numbers for imports before they complete (e.g. when you trigger the import) would cause "holes" in the numbers for issues that fail to import (e.g. due to failing validations) -- you'd see issues 1,2,3,5,6,9,10, etc, where the missing numbers are issues that failed to import.

As mentioned in the gist which describes the issue API, imports are done in the background. When you trigger an import -- you get back an import ID which you can use to check if the import completed, and if it did -- did it succeed or fail. For that reason (background processing), issue numbers are handed out to successfully imported issues once the processing completes, and not when you trigger the import.

If you need to make sure that issues have specific numbers, then the recommended approach is that you trigger an import, wait for that import to complete and make sure it was successful, and only after that trigger a new import.

I'll try to get this fixed soon as it's annoying to have different issue IDs between Github and BB

jeffwidman commented 8 years ago

So I think I have this fixed on my machine, but Bitbucket's API is flaky on 9 requests out of 10 right now, so I can't test my code with any confidence. Hopefully tomorrow I can verify the fix works.

jeffwidman commented 8 years ago

@jaraco can you verify this works for you?

It will be faster than using time.sleep() for all issues, as it only sleeps if the import is still pending. In my tests, less than <20% of issues needed extra time for the import. It also should be more accurate, as I did hit a few issues that needed several seconds to import.

jaraco commented 8 years ago

I very much appreciate the more robust implementation. I ran the migration on the same issue tracker that experienced the issue in two separate attempts, and it succeeded, so I believe the issue is fixed. Thanks!

jeffwidman commented 8 years ago

Thanks again for reporting, I had no idea this was even an issue until you mentioned it.