romanzipp / Laravel-Queue-Monitor

Monitoring Laravel Jobs with your Database
https://packagist.org/packages/romanzipp/laravel-queue-monitor
MIT License
693 stars 91 forks source link

Job stuck on Running #48

Closed ldmuniz closed 1 year ago

ldmuniz commented 3 years ago

Sometimes, when the job hit the timeout, another attempt is created and the original Job stucks on "Running". I suggest a little change in jobStarted function at romanzipp\QueueMonitor\Services\QueueMonitor.php

Original:

protected static function jobStarted(JobContract $job): void
    {
        if ( ! self::shouldBeMonitored($job)) {
            return;
        }

        $now = Carbon::now();

        $model = self::getModel();

        $model::query()->create(['job_id' => self::getJobId($job)],[            
            'name' => $job->resolveName(),
            'queue' => $job->getQueue(),
            'started_at' => $now,
            'started_at_exact' => $now->format(self::TIMESTAMP_EXACT_FORMAT),
            'attempt' => $job->attempts(),
        ]);
    }

Suggestion:

    protected static function jobStarted(JobContract $job): void
    {
        if ( ! self::shouldBeMonitored($job)) {
            return;
        }

        $now = Carbon::now();

        $model = self::getModel();

        $model::query()->updateOrCreate(['job_id' => self::getJobId($job)],[            
            'name' => $job->resolveName(),
            'queue' => $job->getQueue(),
            'started_at' => $now,
            'started_at_exact' => $now->format(self::TIMESTAMP_EXACT_FORMAT),
            'attempt' => $job->attempts(),
        ]);
    }
romanzipp commented 3 years ago

Sadly I can not reproduce this issue.

When executing the following test job which takes 30 seconds to run but specifying --timeout=5 with the artisan queue:work command, the Queue monitor will correctly mark the job as failed.

Maybe the job is hitting your configured max_execution_time threshold? In this case it's not possible to mark the previous attempt as failed since the PHP process will be killed immediately. The following attempt will also generate a new job_id

namespace App\Jobs;

use Illuminate\Bus\Queueable;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Bus\Dispatchable;
use Illuminate\Queue\InteractsWithQueue;
use Illuminate\Queue\SerializesModels;
use romanzipp\QueueMonitor\Traits\IsMonitored;

class TestJob implements ShouldQueue
{
    use Dispatchable;
    use InteractsWithQueue;
    use IsMonitored;
    use Queueable;
    use SerializesModels;
    use IsMonitored;

    public function handle()
    {
        sleep(30);
    }

    public function failed()
    {
        logger('TIMEOUT OCCURED');
    }
}

Screen Shot 2021-01-22 at 08 01 22

javis commented 3 years ago

Same happened to me, I have cases when a job would stop running (I think it is the OS killing the process, still looking into it) and the job would be stuck as running forever

juanparati commented 3 years ago

There another possible reason that keep some jobs stuck on running and is when jobs are distributed across multiple Laravel workers on different servers (distributed system).

See https://github.com/romanzipp/Laravel-Queue-Monitor/pull/68 for more details.

evaleries commented 2 years ago

I have the same issue with this. Instead of update the existing record, this package creates new row.

How to reproduce:

  1. Set a new jobs (pending jobs)
  2. Run the worker
  3. In the middle of running, interrupt the worker and close it
  4. Restart the worker

You'll see a new record in database. Take a look at my example below

screenshot of duplicate row and job stuck on running ![image](https://user-images.githubusercontent.com/48923153/137649386-734a1252-78ff-4c93-95b7-e03581bceae4.png) As you can see, there are 2 FakeJob with number 21. This package should've updated the FakeJob 21 instead creating a new one.

I think the given solution by @ldmuniz should work

romanzipp commented 2 years ago

I've looked into this issue and filed PR #83 which marks previous jobs with the same job id and status "running" as failed.

Could you try out this fix?

Screenshot 2021-10-18 at 07 03 32

@juanparati has submitted a similar PR #68 but i lost track of it since I could not reproduce it on my system.

romanzipp commented 2 years ago

I'd like to avoid reusing the same monitor instance and overriding the attributes or deleting the previous instance since it I'm currently working on a new database structure which takes stale jobs into consideration

evaleries commented 2 years ago

I've tested PR #83, and the problem still occurs. I suggest you mark the job as finished and set failed as true since we verify the job status is finished from the isFinished() method.