Closed szepeviktor closed 7 months ago
Same goes for search-replace
command.
How to search in UTF-8 encoded text?
@szepeviktor I'm not sure I follow. Can you share an example of what you tried, what you saw, and what you expected to see?
Can you share an example of what you tried, what you saw, and what you expected to see?
Issued this command: wp db search '\p{Cf}' --regex
Seen results like: blog h▒▒rlevél feliratkozás
one of the "block" characters was highlighted, so the UTF-8 two byte character was split into two.
\p{Cf}
regular expression is for finding "Format characters", I am looking for U+200B ZERO WIDTH SPACE and other invisible characters in post content and in post meta.
https://unicode.org/charts/PDF/U2000.pdf
You need to add --regex-flags=u
to be UTF-8 compatible 😃
https://www.php.net/manual/en/reference.pcre.pattern.modifiers.php
@szepeviktor Glad I was here to help you figure it out! 😁
Bug Report
Describe the current, buggy behavior
Regexp search for character classes finds individual bytes of an UTF-8 encoded character. e.g. í in "hírlevél" the result is displayed like "blog h▒▒rlevél feliratkozás"
How to search in UTF-8 encoded text?
BTW
wp db search "$(printf '\xc3')" --regex
also finds the first byte of í (actually all characters encoded on two bytes)