laravel / scout

Laravel Scout provides a driver based solution to searching your Eloquent models.
https://laravel.com/docs/scout
MIT License
1.55k stars 329 forks source link

Typesense Import: Index Race Condition #845

Closed saibotk closed 2 months ago

saibotk commented 3 months ago

Scout Version

10.9.0

Scout Driver

Typesense

Laravel Version

10.48.10

PHP Version

8.3.9

Database Driver & Version

Typesense 0.25.2

SDK Version

Typesense PHP 4.9.3

Meilisearch CLI Version

No response

Description

When deleting the index and afterwards importing all models, we can reach a race condition, where the import fails because some jobs run in parallel and all try to create or access the index.

Only one succeeds and the others will, for example, throw this error: Typesense\Exceptions\ObjectNotFound: Not Found or interestingly, only rarely, Typesense\Exceptions\ObjectAlreadyExists: A collection with namealready exists..

This happens when using queued imports. I don't exactly understand why it returns Not Found instead of the more rare collection already exists error, which is logical, but this might be some technical detail in the typesense engine.

E.g. using Horizon with minProcesses = 5 and running the following command on a table with enough entries e.g. ~ 40.000 will sometimes result in those exceptions on some of the jobs.

We should try to make the import more robust and maybe ensure that the index is created before the documents are imported? I'm happy to hear your solutions on this, but i think we should fix this flaw in scout directly.

Steps To Reproduce

In any scout repository with Horizon or some queue workers and a table with enough elements such that chunking happens, run:

php artisan scout:delete-index "App\Models\User" && php artisan scout:import "App\Models\User"

And sometimes the race condition will cause some jobs to fail.

driesvints commented 3 months ago

@jasonbosco @AbdullahFaqeir

saibotk commented 3 months ago

Seems like #822 is related

saibotk commented 3 months ago

I just did some more deep-diving:

A small workaround: When calling artisan scout:flush or scout:delete-index run this afterwards (maybe create a wrapper artisan command):

use Laravel\Scout\EngineManager;
use App\Models\Commodity;

$ts = app(EngineManager::class)->engine();
$model = new Commodity();

$ts->flush($model);
$ts->getCollections()->offsetUnset($model->searchableAs());

Note: This will only happen in queue workers, where i suppose the engine is kept alive? because the process is not restarted after a job? PS: Yes, so this is mainly a caching / state management issue, due to the collection exists cache: CleanShot 2024-07-08 at 17 15 38

driesvints commented 3 months ago

Thanks a lot for your investigation @saibotk. If you're up for it, definitely feel free to send in a PR with a proposed change.

jasonbosco commented 3 months ago

@saibotk thank you for digging into this. Let me know if you're planning to do a PR for this.

If not, @tharropoulos from the Typesense team is also available to look into this.

saibotk commented 3 months ago

I could try to come up with some fix for this, but am currently pretty swamped. Also I think it is necessary to see how this might need some broader change. Because currently i don't know how the cached exist came to be and in which way we need to preserve the functionality of that caching.

Obviously we could just drop the cache and always do an exists query but im also not sure if there might be some better solution using other typesense methods.

Im open to discuss approaches and would appreciate any insights

tharropoulos commented 2 months ago

When I tried to take a look at the server logs, if you use queue:work, these requests are sent: image

If you then restart the worker and execute the import once again, these requests are sent to Typesense: image

If you then use queue:listen, these requests are sent, and the jobs don't fail. image

It must be applicable to the static state generated when flushing the collection, since the change is reflected on the queue:listen command. I'm still looking into it, as I'm not sure what's the best way to approach this would be. I'm not sure we should drop the cache as @saibotk already mentioned.

EDIT: provided some more info regarding the differences between queue:listen and queue:work

tharropoulos commented 2 months ago

The getOrCreateCollectionFromModel method in TypesenseEngine is not calling out to the API, having its state of collections as a single source of truth:

https://github.com/laravel/scout/blob/31742191dc3dca22499c1ae47dde238f211c36a4/src/Engines/TypesenseEngine.php#L497-L505

To ensure matching the state from the server to the worker, if a collection is found, it should then call out to the Typesense server to see if the collection exists on the server as well, if not, it will have to create it on the server.

    protected function getOrCreateCollectionFromModel($model, bool $indexOperation = true): TypesenseCollection
    {
        $method = $indexOperation ? 'indexableAs' : 'searchableAs';

        $collection = $this->typesense->getCollections()->{$model->{$method}()};

        // Initialize a flag to check if the collection exists
        $collectionExists = false;
        if ($collection->exists()) {
            $collectionName = $model->{$method}();
            try  {
                $this->typesense->collections[$collectionName]->retrieve();
                $collectionExists = true;
            } catch (TypesenseClientError $e){
                $collectionExists = false;
            }
        }

        // Use the flag to determine the next steps
        if ($collectionExists) {
            return $this->typesense->getCollections()->{$collectionName};
        }
...
tharropoulos commented 2 months ago

@saibotk First of all, thank you for taking the time to investigate so thoroughly and getting us in the right track. Could you verify that this change fixes the problems on your side as well?

EDIT: fix typo

saibotk commented 2 months ago

Yes i will do so on friday when im back at work thanks for taking the time 🙌

saibotk commented 2 months ago

@tharropoulos Okay got time to test this now and thats my result:

What do you think?

Thank you for the quick response!

driesvints commented 2 months ago

Just checking in after being out for a bit: was this fixed now?

saibotk commented 2 months ago

@driesvints Yes the issue itself is fixed and this can be closed, the followup PR is just to improve the fix. Thanks!

patrickomeara commented 2 months ago

I just tried to import ~1.6m rows in chunks of 500 and hit this error 431 times before hitting a memory issue on a node.

I'm on TypeSense cloud v26 and Scout v10.11. I ran the import directly after a flush.

Is there a recommended way to do an initial import. I'll turn the queue off and try again once I have upgraded my nodes in TypeSense Cloud.

saibotk commented 2 months ago

Weird, the main issue should be fixed in that version.

Though can you try out the latest commit? I'm curious whether that fixes something for your use case @patrickomeara.

patrickomeara commented 2 months ago

OK, after the node upgrade had finished I flushed and re-imported and only got the ObjectAlreadyExists exception once. It could be that my horizon configuration overwhelmed the smaller TypeSense cluster's ingest capabilities for the first few minutes. I doubled the memory and it imported fine.

Thanks for your swift reply @saibotk

liran-co commented 1 month ago

I believe the issue reported here still exists after #846. If you import greater than the chunk size (500) on a queue server, there exists a race condition that can cause the Typesense\Exceptions\ObjectAlreadyExists: A collection with name already exists. error. Note that this occurs even on a fresh import without a flush before.

liran-co commented 1 month ago

Throwing a try/catch around the collection creation call at line 526 fixes the issue. It occurs when two chunks are picked up by two queue processes at the same time, causing a race condition:

        try {
            $this->typesense->getCollections()->create($schema);

            $collection->setExists(true);
        } catch (TypesenseClientError $e) {
            $collection->retrieve();

            $collection->setExists(true);
        }