jackalope / jackalope-doctrine-dbal

Doctrine DBAL transport implementation for Jackalope
http://jackalope.github.io
Other
143 stars 60 forks source link

MySQL utf8mb4 limitations #354

Closed dbu closed 6 years ago

dbu commented 6 years ago

MySQL implemented utf8 support without 4 byte characters and added the utf8mb4 encoding to work around the issue. Indexes for utf8mb4 can not be more than 191 characters.

Postgres is not affected as they implemented full utf8 from the beginning. I did not find anything about this for sqlite, but assume for now that its also not affected.

dbu commented 6 years ago

@alexander-schranz while reading up on the topic, i discovered that it would also be possible to build prefix indexes that only use the first X characters of a field. i wonder if this is what we should do for the path and parent fields? paths can easily become too long.

there is of course a hefty performance penalty when fields are longer than the prefix, as those need to then be scanned without index - but the limitation on full path length seems quite arbitrary and 191 is not a lot of characters...

alexander-schranz commented 6 years ago

@dbu I did see that with the prefixed index but don't know if this is possible in doctrine? As you said its only a problem with mysql we should check for the database type:

The best way would be check over DatabasePlatform reading to the code I'm not sure if there is an additional request done:

if ('mysql' !== $this->connection->getDatabasePlatform()->getName()) {
    return false;
}

Another way would be over the driver but not sure if there are any other mysql drivers:

if ('pdo_mysql' !== $this->connection->getDriver()->getName()) {
    return false;
}
dbu commented 6 years ago

the prefix index thing should be possible with dbal, but while checking, i realize that we also add a unique index, and that seems the killer for this idea... so i guess we leave at that.

i don't think we need to check the platform - utf8mb4 is a mysql specific invention that postgres and sqlite don't even know about, so you can't accidentally set it for those.

do you agree with merging as is? can you please check the commit i added, do you see any flaw with how i make the length null instead of 255 for non utf8mb4? (i checked that the column class should be happy with setting the length to null explicitly)

alexander-schranz commented 6 years ago

@dbu thank you looks good for me and as the utf8mb4 seems to be a mysql only thing I agree to merge it.

alexander-schranz commented 6 years ago

@dbu I tried to add utf8mb4 to the tests but could not get it to work as the tables seems to be created still with utf8 not utf8mb4 in the test setup I checked in my information_schema that there is the default set correctly for the phpcr_tests table.

My tried changes ``` diff --git a/phpunit.xml.dist b/phpunit.xml.dist index d47021c..30680d1 100644 --- a/phpunit.xml.dist +++ b/phpunit.xml.dist @@ -18,12 +18,16 @@ - - + - + + + + + + diff --git a/src/Jackalope/Transport/DoctrineDBAL/Client.php b/src/Jackalope/Transport/DoctrineDBAL/Client.php index e28d51a..aff2f2c 100644 --- a/src/Jackalope/Transport/DoctrineDBAL/Client.php +++ b/src/Jackalope/Transport/DoctrineDBAL/Client.php @@ -464,6 +464,12 @@ class Client extends BaseTransport implements QueryTransport, WritingInterface, $params = $this->conn->getParams(); $charset = isset($params['charset']) ? $params['charset'] : 'utf8'; + if (isset($params['default_table_options']) && $params['default_table_options']['collate']) { + $this->caseSensitiveEncoding = $params['default_table_options']['collate']; + + return $this->caseSensitiveEncoding; + } + return $this->caseSensitiveEncoding = $charset === 'binary' ? $charset : $charset . '_bin'; } diff --git a/tests/bootstrap.php b/tests/bootstrap.php index dfda39d..db3647d 100644 --- a/tests/bootstrap.php +++ b/tests/bootstrap.php @@ -38,7 +38,12 @@ $dbConn = DriverManager::getConnection([ 'port' => @$GLOBALS['phpcr.doctrine.dbal.port'], 'user' => @$GLOBALS['phpcr.doctrine.dbal.username'], 'password' => @$GLOBALS['phpcr.doctrine.dbal.password'], - 'dbname' => @$GLOBALS['phpcr.doctrine.dbal.dbname'] + 'dbname' => @$GLOBALS['phpcr.doctrine.dbal.dbname'], + 'charset' => @$GLOBALS['phpcr.doctrine.dbal.charset'], + 'default_table_options' => [ + 'charset' => 'utf8mb4', + 'collate' => 'utf8mb4_unicode_ci', + ] ]); /** Recreate database schema */ diff --git a/tests/generate_phpunit_config.php b/tests/generate_phpunit_config.php index f635b67..05859c6 100644 --- a/tests/generate_phpunit_config.php +++ b/tests/generate_phpunit_config.php @@ -15,6 +15,11 @@ $config = [ 'phpcr.doctrine.dbal.username' => 'travis', 'phpcr.doctrine.dbal.password' => '', 'phpcr.doctrine.dbal.dbname' => 'phpcr_tests', + 'phpcr.doctrine.dbal.charset' => 'utf8mb4', + 'phpcr.doctrine.dbal.default_table_options' => [ + 'charset' => 'utf8mb4', + 'collate' => 'utf8mb4_unicode_ci', + ], ], 'pgsql' => [ 'phpcr.doctrine.dbal.driver' => 'pdo_pgsql', ```
dbu commented 6 years ago

hm, but it works in your setup? i would be ok to merge as is if you confirm it works.

alexander-schranz commented 6 years ago

@dbu

yes works for me correctly in sulu 👍 ![bildschirmfoto 2018-01-29 um 14 27 59](https://user-images.githubusercontent.com/1698337/35512619-b0c89cdc-0500-11e8-8749-44c5d4fefe6f.png)