laravel / scout

Laravel Scout provides a driver based solution to searching your Eloquent models.
https://laravel.com/docs/scout
MIT License
1.55k stars 329 forks source link

Typesense throwing errors whith latest v10.11.3 update #865

Closed Braunson closed 1 week ago

Braunson commented 3 weeks ago

Scout Version

10.11.3

Scout Driver

Typesense

Laravel Version

10.48.2

PHP Version

8.2.23

Database Driver & Version

No response

SDK Version

No response

Meilisearch CLI Version

No response

Description

The most recent MR #858 causes an issue with Typesense. For example, this code works:

$results = Names::search($this->keyword)
    ->options([
        'filter_by' => $this->buildFilterString(),
        'sort_by' => implode(',', $this->sortBy),
    ])
    ->paginate(48);

But if I drop in this line between options and paginate:

->query(fn (Builder $query) => $query->with('someRelation'))

We then get this error back from Typesense:

Only up to 250 hits can be fetched per page.

The only solution to solve that is to again set the per_page option to match the ->paginate() value.

It looks like adding back #824 resolves this.

Steps To Reproduce

  1. Install Laravel Scout, Typesense
  2. Setup model with thousands of records and import with Scout
  3. Setup model relation
  4. Use query above using code to re-test
github-actions[bot] commented 3 weeks ago

Thank you for reporting this issue!

As Laravel is an open source project, we rely on the community to help us diagnose and fix issues as it is not possible to research and fix every issue reported to us via GitHub.

If possible, please make a pull request fixing the issue you have described, along with corresponding tests. All pull requests are promptly reviewed by the Laravel team.

Thank you!

tharropoulos commented 3 weeks ago

This issue and #866 are tightly linked. The paginate function calls the getTotalCount function to get the total count of the results.

https://github.com/laravel/scout/blob/87906819fb773740f27f8b360df0aca525232015/src/Builder.php#L437

When the query callback is defined, whether empty or not, it will pass the null check: https://github.com/laravel/scout/blob/87906819fb773740f27f8b360df0aca525232015/src/Builder.php#L495-L497

As the count of the mapped ids will always be lesser than total count (the number of the ids will at most match the per page variable), it will go on to: https://github.com/laravel/scout/blob/87906819fb773740f27f8b360df0aca525232015/src/Builder.php#L501-L507

The issue lies at: https://github.com/laravel/scout/blob/87906819fb773740f27f8b360df0aca525232015/src/Builder.php#L504

The builder's limit will be null by default, so the ternary check fails and defaults to the totalCount, which would, in this case, be the total matching count, bar the filter/query. This in turn exceeds Typesense's maximum amount of 250 results per page and will execute a search to Typesense with per_page that will result in an error.

You can check this manually yourself. If you first use the take function (which sets the limit of the builder to the input), you won't get an error:

$results = Names::search($this->keyword)
    ->options([
        'filter_by' => $this->buildFilterString(),
        'sort_by' => implode(',', $this->sortBy),
    ])
    ->query(function ($query) {})
+   ->take(200)
    ->paginate(48);

This is the reverse of what #858 was doing. The problem here is that the keys function maps the ids using a search query using the absolute maximum limit (that being totalAmount). In turn, Typesense will return an error. The main thing to be discussed is the main purpose of this check: https://github.com/laravel/scout/blob/87906819fb773740f27f8b360df0aca525232015/src/Builder.php#L501-L507

We could introduce a function that takes care of this limitation by continuously querying by different page sizes until it reaches the end:

public function search(Builder $builder)
{
    // If the limit is within Typesense's capabilities, perform a single search
    if (is_null($builder->limit) || $builder->limit <= $this->maxResultsPerPage) {
        return $this->performSearch(
            $builder,
            $this->buildSearchParameters($builder, 1, $builder->limit ?? $this->maxResultsPerPage)
        );
    }

    // For larger result sets, we need to paginate
    return $this->performPaginatedSearch($builder);
}

protected function performPaginatedSearch(Builder $builder)
{
   // Sends all the paginated queries to the server and concats the results
}

However, this will introduce horrible performance for every pagination not limited by the user. What's the functionality behind https://github.com/laravel/scout/blob/87906819fb773740f27f8b360df0aca525232015/src/Builder.php#L501-L507 and should we preserve it? If yes, I'll have to look into we can optimize the paginatedSearch function.

tharropoulos commented 3 weeks ago

Meilisearch, for example, limits maximum results from an index at 1000 by default and that value must be manually changed to get more maximum results. If changed to fit the index's size, then it too would introduce performance issues, but that's an opt in. If a function were to be introduced to limit the maximum amount of results from a Typesense collection, then we should be able to mitigate the performance deficit we get by calling the Typesense server num_docs % 250. But that's all if the check on the builder class should be preserved.

jasonbosco commented 3 weeks ago

Algolia also limits max results to 1K for any searches, even with pagination.

If a function were to be introduced to limit the maximum amount of results from a Typesense collection

That sounds like a reasonable trade-off to me, especially to maintain parity with the other engines supported in Scout. With Typesense we can then let users configure this limit in Scout dynamically.

tharropoulos commented 3 weeks ago

Just posted a PR for this on #867, check it out when you get the chance

crynobone commented 1 week ago

PR has been merged