teamtnt / laravel-scout-tntsearch-driver

Driver for Laravel Scout search package based on https://github.com/teamtnt/tntsearch
MIT License
1.1k stars 144 forks source link

Allowing searching through all of the searchable items #131

Closed judahnator closed 6 years ago

judahnator commented 6 years ago

My other PR ( #130) left me with a bad taste in my mouth. I tossed and turned all night, not content with leaving data that the user has displayed an interest in making searchable out of the search index.

I spent some time reverse engineering how the data was being inserted into the indexes, and I believe I have come up with a solution that will allow the indexer to index all the data provided in the toSearchableArray() method and not just the available columns in the database.

Lets say I have a class, "Article," and it looks like this:

<?php

namespace App;

class Article extends Model
{
    use Searchable;

    protected $appends = ['published_at_atom'];

    // Relationship with the author.
    public function author() 
    {
        return $this->belongsTo(User::class, 'author_id');
    }

    // Format the published_at_atom attribute as an atom string for frontend use.
    public function getPublishedAtAtomAttribute(): string
    {
        return $this->published_at->toAtomString();
    }

    public function toSearchableArray(): array
    {
        return array_merge(
            $this->toArray(),
            ['author' => $this->author->full_name]
        );
    }
}

Before, when the indexer ran it would have all of the authors normal attributes, plus the appended "published_at_atom" and the additional "author" item. Unfortunately the indexer would ignore these two attributes, because they did not exist as columns in the database. Now though, I can do something like App\Article::search('author name') and have it return relevant results, where before it would not.

With this new change, instead of manually writing queries we can use native Laravel functions. By doing this Laravel will give us all the data we need in the toSearchableArray() method.

We can map over the toSearchableArray() output to prepare the word stems, and save those to the index directly without the need for raw SQL.

This has some huge advantages, but comes with one obvious disadvantage. In my testing I have seen a 2-3x increase in the time it takes to build search indexes. This might be solved down the road by having the user hint some relationships to eager load, but we can cross that bridge later.

nticaric commented 6 years ago

This seems legit at first look but has some downsides. Using the saveToIndex function directly will not update the total documents number which will affect the BM25 matching algorithm. Also, there is no speed gain comparing to php artisan scout:import App\\Article which includes everything from the toSearchableArray function. The tntsearch:import command is very fast because it leverages MYSQL_ATTR_USE_BUFFERED_QUERY and this is achieved by querying the db table directly

judahnator commented 6 years ago

For the total documents number, that should be as simple as adding this before the script finishes.

$indexer->updateInfoTable('total_documents', $model::count());

I will have to look into the rest, though. I have done some testing between scout:import and tntsearch:import and I am generally more satisfied with the results of the later. I will have to dig around and see why that might be.