singlestore-labs / singlestoredb-laravel-driver

The official SingleStore Laravel driver.
https://github.com/singlestore-labs/singlestore-laravel-driver
Apache License 2.0
223 stars 22 forks source link

Add support for whereFullText #9

Closed fgilio closed 1 year ago

fgilio commented 2 years ago

Hi Aaron!

Another draft PR here, it's missing tests and the implementation is just bare bones.

I found SingleStore's syntax is slightly different from MySQL, so the version from MySqlGrammar doesn't work with SingleStore.

What do you think of this? If I get time to work on the tests, I could also add support for HIGHLIGHT.

aarondfrancis commented 2 years ago

@fgilio this looks great! I added a few simple tests, but wanted to run something by you. I tried running the integration test that I added there and I get and error FULLTEXT KEY with unsupported type when trying to add the fulltext index. Do you get the same?

You can run the integration test via HYBRID_INTEGRATION=1 ./vendor/bin/phpunit --filter=fulltext. You'll need SingleStore credentials in a .env to run it.

fgilio commented 2 years ago

I was able to run the tests locally 🎉 and found the issue, it was not what I had in mind.

So, the problem is that "FULLTEXT is not supported when using the utf8mb4 collation". There's more information here: https://docs.singlestore.com/managed-service/en/reference/sql-reference/data-definition-language-ddl/create-table.html#fulltext-behavior

To make the test pass I just specified the charset and collation for this particular table, like this:

$this->createTable(function (Blueprint $table) {
    $table->id();
    $table->text('first_name');

    $table->fullText(['first_name']);
    $table->charset = 'utf8';
    $table->collation = 'utf8_unicode_ci';
});

Then I thought that this could be handled transparently inside the $table->fullText() index method, and that's the fix I ended up pushing. But I wonder if the method should just check that you are using the right collation and throw an exception if not.

What do you think?

sebastienfontaine commented 2 years ago

I was able to run the tests locally 🎉 and found the issue, it was not what I had in mind.

So, the problem is that "FULLTEXT is not supported when using the utf8mb4 collation". There's more information here: https://docs.singlestore.com/managed-service/en/reference/sql-reference/data-definition-language-ddl/create-table.html#fulltext-behavior

To make the test pass I just specified the charset and collation for this particular table, like this:

$this->createTable(function (Blueprint $table) {
    $table->id();
    $table->text('first_name');

    $table->fullText(['first_name']);
    $table->charset = 'utf8';
    $table->collation = 'utf8_unicode_ci';
});

Then I thought that this could be handled transparently inside the $table->fullText() index method, and that's the fix I ended up pushing. But I wonder if the method should just check that you are using the right collation and throw an exception if not.

What do you think?

I think you should, if possible, catch the error generated by the database, and offer a solution. Indeed if SingleStore makes it compatible with utf8mb4, the collation will be implicitly changed while it is useless. It will then be necessary to modify the package to remove the implicit collation.

fgilio commented 2 years ago

I think you should, if possible, catch the error generated by the database, and offer a solution. Indeed if SingleStore makes it compatible with utf8mb4, the collation will be implicitly changed while it is useless. It will then be necessary to modify the package to remove the implicit collation.

That's actually an interesting idea, I don't know if this is actually possible though... But it should be. Another option would be to document it in the readme, as something to keep in mind when adding this kind of indexes.

EDIT: I think this could be done by extending the build() method of laravel/framework/src/Illuminate/Database/Schema/Blueprint.php, but then at that point I truly wonder if it's ok for the package to do that.

Right now this is what I'd do:

  1. Document it in the readme.
  2. Catch the exception and suggest that the fix might be related to the charset+collation
  3. Avoid automagically tweaking the schema, so this would mean reverting that from the last commit.
carlsverre commented 2 years ago

Hey @fgilio, I am ok with the approach you outlined in your last comment. Automatically changing the schema is a bit tricky when it comes to charsets as it might silently corrupt user data later on (the user not being aware that the table no longer supports emojis for example).

fgilio commented 1 year ago

Hey @fgilio, I am ok with the approach you outlined in your last comment. Automatically changing the schema is a bit tricky when it comes to charsets as it might silently corrupt user data later on (the user not being aware that the table no longer supports emojis for example).

Awesome, let's do that