processwire / processwire-issues

ProcessWire issue reports.
45 stars 2 forks source link

MySQL 8 issue with "contains" selector operator and single quotes #1950

Open chris-shorter opened 4 months ago

chris-shorter commented 4 months ago

Short description of the issue

When using the "contains" selector (e.g. *=) and a search term wrapped in opening and closing single quote characters of and , e.g. "‘aaa’", while using MySQL 8.0.x - a fatal error occurs.

MySQL 5.7 and Mariadb do not seem to have this problem

Expected behavior

Standard page find is performed

Actual behavior

The following error occurs:

Error: Exception: SQLSTATE[HY000]: General error: 3854 Cannot convert string '(\b|[[...' from utf8mb3 to utf16le (in wire/core/PageFinder.php line 821)

Optional: Suggestion for a possible fix

Might not be the best method but we have found it is possible to fix this by changing the rlikeValue method of DatabaseQuerySelectFulltext.php to use multibyte safe preg_replace - however I'm not sure how desirable this is since it would add a new extension requirement to PW (albeit a common one) - e.g.

// DatabaseQuerySelectFulltext.php lines 1270-1275
if(strpos($likeValue, "'") !== false || strpos($likeValue, "’") !== false) {
    // match either straight or curly apostrophe
    $likeValue = mb_ereg_replace('/[\'’]+/', '(\'|’)', $likeValue);
    // if word ends with apostrophe then apostrophe is optional
    $likeValue = rtrim(mb_ereg_replace("('|’) ", "('|’)? ", "$likeValue "));
}

I'm also not an expert php developer and haven't explored the wider implications of this or tested if it affects other selector operators.

Steps to reproduce the issue

  1. Start an environment running mysql 8.0. I have reproduced this on ddev and on a cPanel server
  2. Perform a selector lookup for a term wrapped in single quote characters - this part is slightly inconsistent as ‘ipsum dolor’ will error but ‘ipsum’ will not, ‘aaa’ will error but ‘aaaa’ will not, however the following code should produce an error:
<?php

    $searchterm = "‘ipsum dolor’";
    $results = $pages->find("title*=$searchterm");

    foreach ($results as $result) {
        echo $result->title . '<br>';
    }

?>

Sanitizing the search term with selectorValue doesn't make a difference.

Setup/Environment

adrianbj commented 4 months ago

Hi @chris-shorter - it looks like you should be able to fix by changing the character set from utf8mb3 to utf8mb4.

It has been discussed here: https://stackoverflow.com/questions/72180043/incorrect-string-value-error-for-unconventional-characters

Do you have any particular reason for everything being in utf8mb3? Keep in mind that these days utf8mb4 is faster than utf8mb3.

chris-shorter commented 4 months ago

I didn't really opt for one or the other really, this is just what I had after a fresh install of PW on a default ddev environment