laravel / scout

Laravel Scout provides a driver based solution to searching your Eloquent models.
https://laravel.com/docs/scout
MIT License
1.54k stars 327 forks source link

Exception when searching large datasets for common query using paginate #807

Closed razvaniacob closed 5 months ago

razvaniacob commented 6 months ago

Scout Version

10.8

Scout Driver

Typesense

Laravel Version

10.44.0

PHP Version

8.2.13

Description

When searching for a string that is relatively common throughout a collection of indexed data, the results can not be fetched or displayed because the search with the Scout driver results in the following exception:

 Typesense\\Exceptions\\ObjectUnprocessable(code: 0): Only upto 250 hits can be fetched per page

Steps to reproduce

  1. Start a fresh Laravel project, install Scout and the Typesense Scout driver according to documentation
  2. Add a model and make it searchable
  3. Add a large dataset (~ 50000+ records with "lorem ipsum" text)
  4. Query the dataset and try to display the results in a paginated way:

    $items = Item::search($request-input('query', ''))
     -paginate(10);

    Expected Behavior

    I expect the search to succeed, even when there are more then 250 hits, because that's why I use pagination on my frontend.

    Actual Behavior

    Following exception is thrown:

    Typesense\\Exceptions\\ObjectUnprocessable(code: 0): Only upto 250 hits can be fetched per page
karakhanyans commented 6 months ago

@razvaniacob working on it. 👌

driesvints commented 6 months ago

@jasonbosco do you maybe know more about this?

jasonbosco commented 6 months ago

@driesvints Typesense has a max of 250 hits per page, and beyond that we'd have to use the page parameter to fetch additional pages. It looks like we're not managing this limit automatically within the Scout driver's pagination mechanism.

@karakhanyans is looking into this.

karakhanyans commented 6 months ago

Hi @razvaniacob

I did setup a fresh project and could not reproduce the issue you are having.

  1. I have setup typesense driver
  2. Added Searchable to User model
  3. Configured schema
  4. Imported 10K users with factories
  5. And did search with pagination ( User::search('m')->paginate(10);

The results worked fine.

Here is the repo where I did all that. The results are returned with /users endpoint.

https://github.com/karakhanyans/laravel-scout-typesense

Could you please fork the repo, and add the steps that you did and push them so I can reproduce error?

Thanks.

razvaniacob commented 6 months ago

Thanks @karakhanyans,

I've tested with your code and it works.

So then I started experimenting with what I have so when I do something like this:

return ImportedProperty::search('pipera')->paginate(10)->onEachSide(1)->withQueryString()
    ->through(fn ($obj) => [
        'name' => $obj->source_id,
    ]);

It works, but if I do it like this:

return ImportedProperty::search('pipera')
    ->query(fn (Builder $query) => $query->with(['imported_district', 'neighbourhood']))
    ->paginate(10)->onEachSide(1)->withQueryString()
    ->through(fn ($obj) => [
        'name' => $obj->source_id,
        'district' => $obj->imported_district?->name ?? ($obj->neighbourhood?->name ?? ''),
    ]);

It fails with the error

Screenshot 2024-02-27 at 4 15 29 PM

Maybe it has something to do with the ->query(fn (Builder $query) => $query->with(['imported_district', 'neighbourhood'])) line?

Any thoughts?

karakhanyans commented 6 months ago

@razvaniacob have you tried this line ->query(fn (Builder $query) => $query->with(['imported_district', 'neighbourhood'])) like this $query->with(['imported_district', 'neighbourhood']), without putting it inside query, as it's just a ->with.

razvaniacob commented 6 months ago

I followed the documentation found here Laravel Scout Documentation

If I do just ->with... I get this error Method Laravel\Scout\Builder::with does not exist.

thannaske commented 6 months ago

This is an issue I also encountered, see: https://github.com/typesense/laravel-scout-typesense-driver/issues/86

errand commented 5 months ago

Hey guys, im getting the same issue when using queyr Builder

$posts = Post::search($text)
                ->query(fn (Builder $query) => $query->with(['images:id,name,path,extension']))

Removing it fixed the issue

github-actions[bot] commented 5 months ago

Thank you for reporting this issue!

As Laravel is an open source project, we rely on the community to help us diagnose and fix issues as it is not possible to research and fix every issue reported to us via GitHub.

If possible, please make a pull request fixing the issue you have described, along with corresponding tests. All pull requests are promptly reviewed by the Laravel team.

Thank you!

davidstoker commented 5 months ago

I just ran into this as well when using paginate() with query(). Looks to me like the root cause is how getTotalCount skips using the engine's result if a queryCallback exists. That causes it to call take with the $totalCount result if there is no limit. That ends up calling Typesense with a per_page value of $totalCount triggering the error.

https://github.com/laravel/scout/blob/6e5b47dd6ff1d397fcb78fb2c8cce80e4efc5e4a/src/Builder.php#L480-L509

As a workaround, because of the is_null($this->limit) ? $totalCount : min($this->limit, $totalCount) check, setting a limit in parallel to the paginate call results in a per_page value that's controlled.

This should workaround it for example and not affect pagination since it doesn't look at the $limit value.

$items = Item::search($request-input('query', ''))
    ->take(10)
    ->paginate(10);

It's not clear to me why the existence of the queryCallback should ignore the $totalCount returned by the engine? Why should it force falling back to the query builder for count if the query callback's purpose is to be "invoked after the relevant models have already been retrieved from your application's search engine" as described in docs.

karakhanyans commented 5 months ago

@davidstoker thanks for your input, this helped a lot with solving this. @razvaniacob

cc: @driesvints @jasonbosco

https://github.com/laravel/scout/pull/817