Closed LMCom closed 2 years ago
I don't want to bother you, but at least I would appreciate a hint how to debug this. As a workaround, I have to use single thread indexing and it's slowing things down a lot. A real shame, since the application is running on a 64-core Threadripper.
I also notice you seem to be from Kyiv? I hope you are not there right now and your loved ones are safe! Day after day, it's heartwrenching to see what happens in Ukraine. :( Our governments are NOT doing enough to stop this war and every citizen I know is willing to suffer inconveniences. It would be delusional to think we could have a war in Europe today without consequences for every single one of us.
Hey! First of all, thank you for your support💙💛 I really appreciate all the people who try to help in any way they can and do not stand aside🤗
İ would try to exclude some moving parts to narrow down the cause.
Do you have the same issues for other commands? İ mean the same process ids
Could you use something else instead of pm2?
İf the issue appears only for package specific command then you can try to disable some code into the command(you need to fork the package then)
Maybe the issue appears because of chaining jobs https://github.com/matchish/laravel-scout-elasticsearch/blob/8f0fffc318b527838de9a3712067cfcc21866ed3/src/Console/Commands/ImportCommand.php#L53
Also changes in pm2 configuration could help
https://github.com/matchish/laravel-scout-elasticsearch/issues/190#issuecomment-1100136780
You can try to send import job to queue directly without chaining (fork needed)
Do you have the same issues for other commands? İ mean the same process ids
No.
Could you use something else instead of pm2?
No, too much dependencies. It is a very mature and stable process manager, I don't see a way how it could cause the problem.
Also changes in pm2 configuration could help
I will look into that, maybe there are rules for running/starting/stopping processes that affect this case.
Hey! First of all, thank you for your support💙💛 I really appreciate all the people who try to help in any way they can and do not stand aside🤗
You're welcome! It's a matter of course.
Ok, I guess it's solved. It was a simple reason after all.
It always worked with 1 worker. With 2 workers, I got the following:
Worker 0
[2022-04-20 08:52:37][dU53PW37lcxbwCXMXxIbkPhPzoGOev7c] Processing: Matchish\ScoutElasticSearch\Jobs\QueueableJob
[2022-04-20 08:55:05][dU53PW37lcxbwCXMXxIbkPhPzoGOev7c] Processed: Matchish\ScoutElasticSearch\Jobs\QueueableJob
Worker 1
[2022-04-20 08:54:07][dU53PW37lcxbwCXMXxIbkPhPzoGOev7c] Processing: Matchish\ScoutElasticSearch\Jobs\QueueableJob
[2022-04-20 08:54:07][dU53PW37lcxbwCXMXxIbkPhPzoGOev7c] Failed: Matchish\ScoutElasticSearch\Jobs\QueueableJob
The process started by worker 1 is failing immediately. Notably, it always starts 90 seconds after worker 0 started the first process.
I didn't know about the setting retry_after
in the queue config, which was set to 90 seconds. And even after I found it, I interpreted it as: "retry a failed job after that number of seconds."
Actually, as per the documentation, retry_after
will even retry a job, while it is running. And if a job is running and retried, it fails immediately.
In your config/queue.php configuration file, each queue connection defines a retry_after option. This option specifies how many seconds the queue connection should wait before retrying a job that is being processed. For example, if the value of retry_after is set to 90, the job will be released back onto the queue if it has been processing for 90 seconds without being released or deleted. Typically, you should set the retry_after value to the maximum number of seconds your jobs should reasonably take to complete processing.
(https://laravel.com/docs/9.x/queues)
Why does it always work with exactly one worker? I guess, a worker has to be available to retry a job. The Laravel doc does not mention that retry_after
is irrelevant (afaik), if only one worker is running.
Two questions:
While investigating the code, I noticed that the import does not use parallelization. The job \Matchish\ScoutElasticSearch\Jobs\Import
uses import stages from \Matchish\ScoutElasticSearch\Jobs\ImportStages::fromSource
. There, the data is chunked and all chunks are processed in those stages. I. e. the chunks are not distributed over several jobs. Why?
And am I assuming correctly that $job = (new QueueableJob())->chain([$job]);
is used to prevent an exception that would occur, when the job tried to output something to a non-existent output stream?
I don't remember details, but you're right Import was designed to run in parallel but at some point there were some issues with consistency because of overlapping pages, performance chunking, stages execution order, etc.
I've have some proof of concept here and it worked last time I checked (no progress report yet). So if you have time and willing, be my guest to try it) PR highly welcome as well. https://github.com/matchish/laravel-scout-elasticsearch/issues/78
Chaining jobs just a way to not have if
and to work with any job in the same way. If it's not queueable job it will be executed immediately, otherwise will go to a queue
performance chunking
I noticed, too, that the native chunk()
of Laravel gets incrementally slower after each chunk. I had some use-cases where it became too slow to be useable. It has to do with how MySQL works, it seems.
I am overwhelmed by work, so I do not think I will contribute to the code anytime soon. But you may be interested in some quick solution I wrote to get performant chunking:
public static function speedChunk($query, $chunkSize, $callback, $keyName = 'id')
{
$idQuery = clone $query;
$ids = $idQuery->orderBy($keyName)
->pluck($keyName)
->nth($chunkSize);
for($i = 0; $i < count($ids); ++$i) {
$chunkQuery = (clone $query)->where($keyName, ($i ? '>' : '>='), $ids[$i]);
if(isset($ids[$i + 1])) {
$chunkQuery->where($keyName, '<=', $ids[$i + 1]);
}
$callback($chunkQuery->get());
}
}
(old code, without types ;) )
This does not get slower with each chunk.
Will compare my solution with this one when come back to parallel import. Thanks)
IMHO, this issue could be closed.
Version Laravel: 8.62.0 Scout: 9.2.10 Package: 5.0.1
Problem I get entries in the table failed_jobs like
Illuminate\Queue\MaxAttemptsExceededException: Matchish\ScoutElasticSearch\Jobs\QueueableJob has been attempted too many times or run too long.
I don't think it's actually a timeout, because payload containstimeout: null
andfailOnTimeout: false
.I spent hours searching for the cause and even logged the responses of Elasticsearch. Looking into worker.log, which is maintained by pm2, I see:
It strikes me as odd that a process is started twice with the same id. The process started at 09:48:50 fails immediately (written to failed_jobs). I can see in
pm2 monit
that one queue worker is doing work until the log entry at 10:05:17 confirms that the job was processed. This also shows that there is no timeout problem and the job actually works.I guess it's tough, even for you, to diagnose this behaviour. Could you at least point out, how I could debug this problem better and narrow down the cause?