[Fixed] Queued jobs failed with restarting elasticsearch service .

alirezaImani-f4L3e commented 6 months ago

Package version

Im using v2.8.7 of the package

Describe the bug

Im using this package to insert the http request info of every request that hits the application to elasticsearch and for that im using packae in a queued job . in normal situations everything is working fine . but when elasticsearch container goes down for a moment while users are using the application and again comes up to work we still has connection failed to elasticsearch from queued job . for a temporary solution we have to run queue:restart to fix this .

I wonder if its a bug in this package or in my application code .

To Reproduce

1- create a queued job and within that job try to insert some data to elasticsearch . 2- stop the elasticsearch container . 3- try dispathing the job and you will get error in queue , inserting to elasticsearch . 4- start the container again . 5- you are still getting failed despite that the elasticsearch container works fine .

Expected behavior

We expect that application continue to inserting request logs to elasticsearch after elasticsearch container continue to work correctly .

pdphilip commented 6 months ago

This smells like a queue config issue. If ES is down all that will happen is that it will throw an exception (No alive nodes) & queues should catch these.

Are you using Horizon and Redis? Do you have supervisor in place to start up again if it goes down?

If you can point to a specific touch point that is causing this as a side effect because of the package then happy to continue here, else will close this in a few days time under the assumption that it falls out of the scope of this package.

I suggest asking the StackOverflow community for a hand using the queue issue as a starting point.

Also, consider upgrading to the 3.8.x version of the package - probably won't help you with this issue, but it's now the maintained version and far more advanced. Good luck!

alirezaImani-f4L3e commented 6 months ago

Yes we are using horizon and redis . and we have a supervisor to bring back the ES service to work .

When ES is down we are getting (No alive nodes) exception and this is the expected behavior . but when we bring back up the ES service we are still getting (No alive nodes) exception .

I was searching about this issue and i faced a topic about guzzlehttp and CURLOPT_FORBID_REUSE and I think its probably responsible for this issue that elasticsearch package is reusing the connection that we have using when ES service is down .

Also Im trying to examine this issue on 3.8.x ....

alirezaImani-f4L3e commented 6 months ago

I have tested v3.8.x and get same result and the issue is still valid .

example-laravel-elasticsearch(laravel 11.x and laravel-elasticsearch 3.11.x)

I have been created an example to show the problem . please run this project using docker compose file . (elasticsearch and kibana services included in docker-compose)

After running migrations of the app you can hit the / route to test inserting to elasticsearch .

We have simple job that takes a name and insert it to an index in the elasticsearch .

Then try to stop the elasticsearch service , hit the / route to create the job in the queue (that the job will fail ), and then start the elasticsearch again and you still getting (no alive node available) exception .

pdphilip commented 6 months ago

Can you try simulate this by using the ES PHP client directly?

https://github.com/elastic/elasticsearch-php

Just make job that runs a connection and query manually, then fail the container etc.

Let me know what you find

alirezaImani-f4L3e commented 6 months ago

I have been tested elasticsearch-php package and it was working fine after ES service goes up again .

sample

pdphilip commented 6 months ago

Last thing I'd like you to try before I commit time to this. Please try simulating the same as native MySQL. And if possible using mongoDB and https://github.com/mongodb/laravel-mongodb

Horizon caches most laravel settings and doesn't clear it until you restart Horizon. Could that point to something?

alirezaImani-f4L3e commented 6 months ago

I tried Mysql before(stop mysql service and start it) and its working correctly with queued jobs .

I think there is problem with reconnection functionality in Illuminate\Database\Connection that implemented for Pdo and we are not handling reconnection in the package .

alirezaImani-f4L3e commented 6 months ago

Also i have tried laravel-mongodb package and its working perfectly in queued job .

alirezaImani-f4L3e commented 6 months ago

I have updated sample-project

implemented these three samples in InsertToElastic job :

laravel-elasticsearch
elasticsearch-php
laravel-mongodb

pdphilip commented 6 months ago

Hey @alirezaImani-f4L3e - tried and cannot recreate the issue you're facing. Retrying failed jobs works for me:

Different to your sample I used predis though:

composer require predis/predis

and

REDIS_CLIENT=predis

CACHE_DRIVER=redis
QUEUE_CONNECTION=redis
SESSION_DRIVER=redis

Try that, maybe it's something but either way since we're both getting different results with the same code then this must be an environment issue. Can't be sure until I can see it. Sorry man

pdphilip commented 6 months ago

Just a side note, if you're deferring these to a job queue because these writes are slow, keep in mind that you can save/create without refreshing if you will not be working with that record immediately after. Speed is near instant.

See: https://elasticsearch.pdphilip.com/saving-models#fast-saves

But seeing your ES containers are going down perhaps queuing is safer

alirezaImani-f4L3e commented 6 months ago

It's not working even with predis .

Which command have you used ? (queue:listen or queue:work)

queue:listen is working fine because it's not caching the job code . we have problem with queue:work .

Can you share your example ?

pdphilip commented 6 months ago

php artisan horizon in local and prod

In prod I keep it running via supervisor settings

alirezaImani-f4L3e commented 6 months ago

If you don't mind , try starting workers with queue:work .

pdphilip commented 6 months ago

So that seems to cache the connection and lock that in when it fails. I have no idea why it would do that.

If I rebuild the connection on every request then that solves this [very] specific issue. However, in doing so it slowed down my tests run time by between 5-15%, which is significant. Given that, it's not viable to incur the performance cost to cover this edge case.

I'll leave this open anyway and look at building in a circuit breaker for failed calls. That should fix this issue without compromising performance, but will need a significant upgrade with how the bridge handles errors.

In the meantime, I suggest running Horizon directly and config a supervisor: https://laravel.com/docs/11.x/horizon#supervisor-configuration - assuming it's in scope of your project of course

pdphilip commented 6 months ago

@alirezaImani-f4L3e - Am bundling this in with the next release. Can you update dev branch and try again?

It should work now. Let me know so that I know to close this on release notes. Thanks.

alirezaImani-f4L3e commented 6 months ago

Yes now it's working fine . Thank you for your attention .

Good job.

alirezaImani-f4L3e commented 6 months ago

Will you apply this fix in 3.8.x in next release ?

pdphilip commented 6 months ago

Indeed 👍

pdphilip commented 6 months ago

Hi @alirezaImani-f4L3e - 3.8.1 has been released with a fix for this

pdphilip / laravel-elasticsearch