The fix to the infinite loop problem

mfs-mindsize commented 5 years ago

Describe the bug The current logic in the migrate() method is effectively...

Join the migrated orders (in the new table) to the posts table (i.e., those still be migrated) and grab a batch-size worth of rows from the posts table, and migrate those unmigrated orders. This query always starts at the "top" of the posts table.

While looping over the batches, "bad orders" (i.e. those that can't be migrated for some reason) are added to $this->skipped_ids[]. The code checks against that array so those ids are ignored.

However, the ids in the skipped_ids are not excluded from the query. That is, those ids continue to be selected by the query. Yes, they continue to go unprocessed since they're in the skipped_ids array. But they are still being selected and processed (but not migraetd) - over and over and over :)

As the # of skipped ids increases, the # of row actually being migrated goes down. That is, the batch slowly fills with bad orders. Put another way, it takes more queries to get through all the unmigrated posts (orders).

For example, let's say there are 100,000 total orders. But once 40,000 are processed there are 50 ids in skipped_ids[] and the batch size is 100. That means, only 50 new / unprocessed orders are being processed in any query / batch. The other 50% of the batch is re-processed "bad order".

The killer is this...Eventually, if the # of bad orders is equal to the batch size, the infinite loop condition will be hit. That is, the infinite loop condition is (effectively) "hey! I just did this same batch and I'm not going to do it again. " If there's a batch worth of bad orders then it's going to look like an infinite loop. What's actually happening is, there's no more room for good orders, even if there are still good orders to be process.

To Reproduce Use a batch size that is smaller than the # or "bad orders."

Expected behavior There are (at least) two options

1) Don't use batch-size (i.e., process it all in one fell swoop). If there's no batch then the infinite loop condition can't be met. There's no way to compare one query to the next since there's only one query :)

2) The query should offset by the # of skipped ids. That is, avoid going back to the top and selecting the orders that are already known to not be migration worthy. Skip them.

stevegrunwell commented 4 years ago

I'll be honest: I'm embarrassed that I didn't catch this, but you're absolutely right. This, along with #146, will make a huge difference in the quality of migrations.

Another option not mentioned above is adding a NOT IN clause to the query, though an offset would work just as well (especially if someone has hundreds of "bad" orders).

Thank you for the perspective!

mfs-mindsize commented 4 years ago

@stevegrunwell - Glad to help, of course. It's not easy to spot because I don't think anyone is really expecting to have that many un-migratable orders. That said, truthful I was proud to have tracked it down, especially for such an old (and closed) issue :)

Obviously, there are a number of ways to fix this. I used offset to highlight the nature of the problem in the event my explanation up to that point wasn't clear. Also, for reasons I can't explain IN always strikes me as being "too intensive". SQL by its nature isn't strings-minded. Given the choice, I sidestep IN. But my fear could be myth/false.

liquidweb / woocommerce-custom-orders-table

The fix to the infinite loop problem #148