wp-cli / search-replace-command

Searches/replaces strings in the database.
MIT License
57 stars 45 forks source link

Search-Replace command not working on gutenberg serialized content #126

Closed leila-racke-zz closed 5 years ago

leila-racke-zz commented 5 years ago

I'm not sure if this should be a feature or not, however I did notice when I ran search-replaces, largely for switching domains, it seems to be skipping over the Gutenberg content. I believe it's because this data is serialized in a specific way.

kyodev commented 5 years ago

for serialized content, you have to use the option --precise

[--precise]
Force the use of PHP (instead of SQL) which is more thorough, but slower.

do you use it?

schlessera commented 5 years ago

@leila-racke Do you have a specific example of what Gutenberg content you are referring to? The HTML comment attributes in the post content?

leila-racke-zz commented 5 years ago

@kyodev Hey there, I did end up trying the --precise flag, that was actually the first thing I tried. Let me see if I can dig up an example of something escaped oddly. I'm using an ACF integration w/ Gutenberg, so it's possible it's more related to that than the CLI tool, but I'm unsure.

<!-- wp:acf/steps { "id": "block_5d6029485a84e", "name": "acf\/steps", "data": { "steps_0_title": "Register", "_steps_0_title": "field_5d6029211914d", "steps_0_details": "Register for the Medical Marijuana program at <a title=\"Register\" href=\"http:\/\/medicalmarijuana.oh.gov\" target=\"_blank\" rel=\"noopener\">medicalmarijuana.oh.gov<\/a>", "_steps_0_details": "field_5d6029261914e", "steps_1_title": "Get Your Card", "_steps_1_title": "field_5d6029211914d", "steps_1_details": "Complete registration by paying for a medical marijuana ID card.", "_steps_1_details": "field_5d6029261914e", "steps_2_title": "Find A Practicioner", "_steps_2_title": "field_5d6029211914d", "steps_2_details": "Obtain a patient certification that you suffer from one of the <a href=\"#\">17 serious<\/a> medical conditions. Or <a href=\"#\">find a practitioner<\/a> on the DOH website.", "_steps_2_details": "field_5d6029261914e", "steps_3_title": "Visit A Dispensary", "_steps_3_title": "field_5d6029211914d", "steps_3_details": "Visit a dispensary in Ohio to obtain medical marijuana. <a href=\"#\">Find a dispensary near you.<\/a>", "_steps_3_details": "field_5d6029261914e", "steps": 4, "_steps": "field_5d6029101914c" }, "align": "full", "mode": "auto" } /-->

That's just a sample of how it was escaped in the database. The biggest issue I'm running into, is I'm trying to replace like http://stagingdomain.com with https://livedomain.com, and I'm having issues getting the CLI tool to recognize the escaped back-slashes in the protocol. I've had to resort to replacing just the domain, and then going back and replacing http w/ https to get around it, but it would be nice if there were a more elegant and less work-intensive solution.

Edit & Update: So I ended up just including both the escaped, and non-escaped versions in my wp-cli search-replace command, and that seems to resolve my issue, I'll likely just update my build process for now, I'm imagining this isn't a feature that'll get added anytime soon.

schlessera commented 5 years ago

Indeed, we cannot simply add automatic unslashing here as it would lead to unexpected results. The general rule for search-replace is to be as strict as possible to avoid unintended changes, even if it means you need to do multiple passes.

However, I think you might be able to solve the above in one go with judicious use of the regex flag family --regex. I'm not sure it is worth the effort, though.

Closing this as the behavior is as intended.