wp-cli / search-replace-command

Searches/replaces strings in the database.
MIT License
57 stars 45 forks source link

Address long-running queries and OOM contributor in PHP replacement #176

Closed brandonpayton closed 1 year ago

brandonpayton commented 1 year ago

This change makes PHP replacement queries and memory usage more efficient in order to address issues we've encountered with long-running queries and OOM conditions with certain sites.

It improves the primary key SELECT queries by eliminating the use of OFFSET because OFFSET requires that the database consider all rows up to OFFSET before taking rows up to the LIMIT. This can make queries become slower as OFFSET increases. The new SELECT query relies on primary key conditions to more efficiently eliminate previous rows from consideration. This way, the database can use an index to identify rows with keys greater than those of the previous chunk and then take rows from that set up to the LIMIT.

It improves memory usage by doing updates along the way rather than storing all a column's updated values in memory until the end. At Automattic, when we limit search-replace to 4GB of memory, we sometimes exceed that limit for large sites. It's possible there are other things that contribute to high memory usage within the search-replace command, but as a first step, we can reduce memory requirements by no longer keeping all updated column values in memory simultaneously.

danielbachhuber commented 1 year ago

@brandonpayton Has this been running in production for some time? Also, are there any particular test cases you think we should add to cover the change?

brandonpayton commented 1 year ago

@danielbachhuber, this is not running in production yet but should be next week. I'll withdraw this PR for now and plan to resubmit later after it has been in production for a while. I figured the current test cases would cover this code as the resulting replacements should not change, but it would be a good idea for me to review the tests and confirm that. I can do that before resubmitting the PR. Thanks!

danielbachhuber commented 1 year ago

I'll withdraw this PR for now and plan to resubmit later after it has been in production for a while.

@brandonpayton Sounds good, thanks!

I figured the current test cases would cover this code as the resulting replacements should not change, but it would be a good idea for me to review the tests and confirm that.

I think the functional existing tests are pretty good but may or may not be comprehensive enough to cover this. It could be fine or there could be some minor idiosyncrasy that causes problems.

brandonpayton commented 1 year ago

The regex path of this change has been used in production for the last 3-4 months, and we have encountered no issues so far. So I am reopening this PR for consideration.

We already had tests for the chunked precise search-replace but none for the chunked regex path, so I added some based on the "precise" tests.

brandonpayton commented 1 year ago

Actually, I do not have permission to reopen. :) @danielbachhuber, are you up for reopening this PR?

danielbachhuber commented 1 year ago

@brandonpayton It seems like the branch disappeared for me too:

image

Want to create a new PR?

brandonpayton commented 1 year ago

I'll withdraw this PR for now and plan to resubmit later after it has been in production for a while.

It was the originally stated plan anyway, but I figured "why not just reopen?" I guess this is why :)

Thanks for looking, @danielbachhuber!