laravel / framework

The Laravel Framework.
https://laravel.com
MIT License
32.22k stars 10.9k forks source link

Laravel 5. PDOException. QUEUE_DRIVER=database 1213 Deadlock #7046

Closed easmith closed 8 years ago

easmith commented 9 years ago

I use QUEUE_DRIVER=database When I run more than 10 workers, I get the following error:

'PDOException' with message 'SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction
[2015-01-19 16:45:00] production.ERROR: exception 'PDOException' with message 'SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction' in /home/www/vendor/laravel/framework/src/Illuminate/Database/Connection.php:380
Stack trace:
#0 /home/www/vendor/laravel/framework/src/Illuminate/Database/Connection.php(380): PDOStatement->execute(Array)
#1 /home/www/vendor/laravel/framework/src/Illuminate/Database/Connection.php(606): Illuminate\Database\Connection->Illuminate\Database\{closure}(Object(Illuminate\Database\MySqlConnection), 'update `jobs` s...', Array)
#2 /home/www/vendor/laravel/framework/src/Illuminate/Database/Connection.php(570): Illuminate\Database\Connection->runQueryCallback('update `jobs` s...', Array, Object(Closure))
#3 /home/www/vendor/laravel/framework/src/Illuminate/Database/Connection.php(383): Illuminate\Database\Connection->run('update `jobs` s...', Array, Object(Closure))
#4 /home/www/vendor/laravel/framework/src/Illuminate/Database/Connection.php(328): Illuminate\Database\Connection->affectingStatement('update `jobs` s...', Array)
#5 /home/www/vendor/laravel/framework/src/Illuminate/Database/Query/Builder.php(1747): Illuminate\Database\Connection->update('update `jobs` s...', Array)
#6 /home/www/vendor/laravel/framework/src/Illuminate/Queue/DatabaseQueue.php(181): Illuminate\Database\Query\Builder->update(Array)
#7 /home/www/vendor/laravel/framework/src/Illuminate/Queue/DatabaseQueue.php(146): Illuminate\Queue\DatabaseQueue->releaseJobsThatHaveBeenReservedTooLong('queuename1')
#8 /home/www/vendor/laravel/framework/src/Illuminate/Queue/Worker.php(180): Illuminate\Queue\DatabaseQueue->pop('queuename1')
#9 /home/www/vendor/laravel/framework/src/Illuminate/Queue/Worker.php(150): Illuminate\Queue\Worker->getNextJob(Object(Illuminate\Queue\DatabaseQueue), 'queuename1')
#10 /home/www/vendor/laravel/framework/src/Illuminate/Queue/Worker.php(113): Illuminate\Queue\Worker->pop(NULL, 'queuename1', '0', '3', '3')
#11 /home/www/vendor/laravel/framework/src/Illuminate/Queue/Worker.php(85): Illuminate\Queue\Worker->runNextJobForDaemon(NULL, 'queuename1', '0', '3', '3')
#12 /home/www/vendor/laravel/framework/src/Illuminate/Queue/Console/WorkCommand.php(101): Illuminate\Queue\Worker->daemon(NULL, 'queuename1', '0', '256', '3', '3')
#13 /home/www/vendor/laravel/framework/src/Illuminate/Queue/Console/WorkCommand.php(67): Illuminate\Queue\Console\WorkCommand->runWorker(NULL, 'queuename1', '0', '256', true)
#14 [internal function]: Illuminate\Queue\Console\WorkCommand->fire()
#15 /home/www/vendor/laravel/framework/src/Illuminate/Container/Container.php(523): call_user_func_array(Array, Array)
#16 /home/www/vendor/laravel/framework/src/Illuminate/Console/Command.php(114): Illuminate\Container\Container->call(Array)
#17 /home/www/vendor/symfony/console/Symfony/Component/Console/Command/Command.php(253): Illuminate\Console\Command->execute(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#18 /home/www/vendor/laravel/framework/src/Illuminate/Console/Command.php(100): Symfony\Component\Console\Command\Command->run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#19 /home/www/vendor/symfony/console/Symfony/Component/Console/Application.php(874): Illuminate\Console\Command->run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#20 /home/www/vendor/symfony/console/Symfony/Component/Console/Application.php(195): Symfony\Component\Console\Application->doRunCommand(Object(Illuminate\Queue\Console\WorkCommand), Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#21 /home/www/vendor/symfony/console/Symfony/Component/Console/Application.php(126): Symfony\Component\Console\Application->doRun(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#22 /home/www/vendor/laravel/framework/src/Illuminate/Foundation/Console/Kernel.php(91): Symfony\Component\Console\Application->run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#23 /home/www/artisan(34): Illuminate\Foundation\Console\Kernel->handle(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#24 {main}

Next exception 'Illuminate\Database\QueryException' with message 'SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction (SQL: update `jobs` set `reserved` = 0, `reserved_at` = , `attempts` = attempts + 1 where `queue` = queuename1 and `reserved` = 1 and `reserved_at` <= 1421664300)' in /home/www/vendor/laravel/framework/src/Illuminate/Database/Connection.php:614
Stack trace:
#0 /home/www/vendor/laravel/framework/src/Illuminate/Database/Connection.php(570): Illuminate\Database\Connection->runQueryCallback('update `jobs` s...', Array, Object(Closure))
#1 /home/www/vendor/laravel/framework/src/Illuminate/Database/Connection.php(383): Illuminate\Database\Connection->run('update `jobs` s...', Array, Object(Closure))
#2 /home/www/vendor/laravel/framework/src/Illuminate/Database/Connection.php(328): Illuminate\Database\Connection->affectingStatement('update `jobs` s...', Array)
#3 /home/www/vendor/laravel/framework/src/Illuminate/Database/Query/Builder.php(1747): Illuminate\Database\Connection->update('update `jobs` s...', Array)
#4 /home/www/vendor/laravel/framework/src/Illuminate/Queue/DatabaseQueue.php(181): Illuminate\Database\Query\Builder->update(Array)
#5 /home/www/vendor/laravel/framework/src/Illuminate/Queue/DatabaseQueue.php(146): Illuminate\Queue\DatabaseQueue->releaseJobsThatHaveBeenReservedTooLong('queuename1')
#6 /home/www/vendor/laravel/framework/src/Illuminate/Queue/Worker.php(180): Illuminate\Queue\DatabaseQueue->pop('queuename1')
#7 /home/www/vendor/laravel/framework/src/Illuminate/Queue/Worker.php(150): Illuminate\Queue\Worker->getNextJob(Object(Illuminate\Queue\DatabaseQueue), 'queuename1')
#8 /home/www/vendor/laravel/framework/src/Illuminate/Queue/Worker.php(113): Illuminate\Queue\Worker->pop(NULL, 'queuename1', '0', '3', '3')
#9 /home/www/vendor/laravel/framework/src/Illuminate/Queue/Worker.php(85): Illuminate\Queue\Worker->runNextJobForDaemon(NULL, 'queuename1', '0', '3', '3')
#10 /home/www/vendor/laravel/framework/src/Illuminate/Queue/Console/WorkCommand.php(101): Illuminate\Queue\Worker->daemon(NULL, 'queuename1', '0', '256', '3', '3')
#11 /home/www/vendor/laravel/framework/src/Illuminate/Queue/Console/WorkCommand.php(67): Illuminate\Queue\Console\WorkCommand->runWorker(NULL, 'queuename1', '0', '256', true)
#12 [internal function]: Illuminate\Queue\Console\WorkCommand->fire()
#13 /home/www/vendor/laravel/framework/src/Illuminate/Container/Container.php(523): call_user_func_array(Array, Array)
#14 /home/www/vendor/laravel/framework/src/Illuminate/Console/Command.php(114): Illuminate\Container\Container->call(Array)
#15 /home/www/vendor/symfony/console/Symfony/Component/Console/Command/Command.php(253): Illuminate\Console\Command->execute(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#16 /home/www/vendor/laravel/framework/src/Illuminate/Console/Command.php(100): Symfony\Component\Console\Command\Command->run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#17 /home/www/vendor/symfony/console/Symfony/Component/Console/Application.php(874): Illuminate\Console\Command->run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#18 /home/www/vendor/symfony/console/Symfony/Component/Console/Application.php(195): Symfony\Component\Console\Application->doRunCommand(Object(Illuminate\Queue\Console\WorkCommand), Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#19 /home/www/vendor/symfony/console/Symfony/Component/Console/Application.php(126): Symfony\Component\Console\Application->doRun(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#20 /home/www/vendor/laravel/framework/src/Illuminate/Foundation/Console/Kernel.php(91): Symfony\Component\Console\Application->run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#21 /home/www/artisan(34): Illuminate\Foundation\Console\Kernel->handle(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#22 {main}  
GrahamCampbell commented 9 years ago

If you're running 10 workers, than the database driver isn't right for you. I'd suggesting using a driver actually designed for queuing like beanstalk. The database driver is there for people who want a simple quick queuing setup for a low number of jobs.

alexglue commented 9 years ago

@GrahamCampbell I've got deadlock with just 5 workers. That's too much for db-driver too?

bequadrat commented 9 years ago

I get deadlocks even with 3 workers running for different queues each :-/

GrahamCampbell commented 9 years ago

I get deadlocks even with 3 workers running for different queues each :-/

You can only use 1 worker if you want to use the database driver. If you need anything more, you should probably check out a real solution that's designed for queuing like beanstalkd, or something that supports first on, first off, like redis. Both are fine choices for queue drivers.

alexglue commented 9 years ago

@GrahamCampbell Maybe you just need to put a process ID to jobs table, for example, and read its value from artisan:worker to avoid dead locking?

ryanhungate commented 8 years ago

Hey @GrahamCampbell and @taylorotwell , I am getting the same error here. Up until recently I've been using the Beanstalkd driver. ( works like a charm really ) but I would love to take advantage of this new database driver... it's so much cleaner to manage jobs in my opinion, and easier to display what's 'in the queue' to the users if you wanted to do that.

First off, I have extended the database driver to also put a 'key' on the queue jobs while pushing them in... and this has been super helpful for me. Does anyone have an idea on how we could use this in a production environment that absolutely needs multiple queue runners? Think of scaling out on this one, it has to support any number of runners that we need to keep our services running smoothly.

20TRIES commented 8 years ago

Took a few days attempting debugging before coming across this post; may be worth mentioning in the documentation to only use one worker with the database queue? Would have definitely saved me a lot of time.

bbthorntz commented 8 years ago

Also getting deadlocks with multiple database workers. Seems strange that the documentation suggests creating 8 processes within the supervisor config - yet it's recommended to only use one (or am I missing the point here?).

hayesrobin commented 8 years ago

same here! Also CPU usages gets 100%.

rabbitfang commented 8 years ago

I ran into this as well (the documentation should be fixed). It seems that using queue:listen instead of queue:work --daemon removes the issues caused by the deadlocks. They still occur, but don't cause any problems for me as the listener automatically causes a retry. I am running at 8 queue listeners.

taylorotwell commented 8 years ago

You shouldn't be limited to 1 worker with database driver as @GrahamCampbell said. Don't know why he said that.

Anyways, hopefully we can figure this out. I've seen it before myself when running quite a few database workers.

taylorotwell commented 8 years ago

The issue is in the query that migrates jobs that have been reserved too long to being available. We have a bunch of workers trying to execute that query simultaneously. We could simply ignore problems with that query and assume another worker is already doing it. I'm curious if even when you get deadlocks your queues keep processing as normal?

GrahamCampbell commented 8 years ago

@taylorotwell Databases aren't built for this sort of application, which is why we'll get these sort of issues at scale. I think you're totally correct that the queue will still work just fine, even with these errors though. They'll just get progressively worse the more workers you add, and eventually will just totally lock up. Say if you pushed a million jobs to the queue, and had 1000 db workers,

rabbitfang commented 8 years ago

@GrahamCampbell is correct in that DB's (specificially for me, MySQL) do not handle task queues well, due to the nature of row/table locks. When running queue:listen (backed by supervisord), Laravel was able to continue processing the queue, even with the deadlocks. When using queue:work --daemon, the entire DB locked up; stopping the workers was not enough to clear the deadlocks in the DB, a full MySQL restart was needed. I'm not sure why queue:work --daemon locked up the DB, but it might have had to do with the speed at which it was able to process tasks.

Ignoring the deadlocks errors would help. Additionally, it would be useful to reduce the frequency at which expired reservations are checked, in a similar fashion to PHP & Laravel's session garbage collector. At the very least, the behavior should be documented.

taylorotwell commented 8 years ago

I agree that there might should be some sort of “lottery” or a different way to migrate the reserved jobs that need to be made available again.

On May 17, 2016, at 1:59 PM, rabbitfang notifications@github.com wrote:

@GrahamCampbell https://github.com/GrahamCampbell is correct in that DB's (specificially for me, MySQL) do not handle task queues well, due to the nature of row/table locks. When running queue:listen (backed by supervisord), Laravel was able to continue processing the queue, even with the deadlocks. When using queue:work --daemon, the entire DB locked up; stopping the workers was not enough to clear the deadlocks in the DB, a full MySQL restart was needed. I'm not sure why queue:work --daemon locked up the DB, but it might have had to do with the speed at which it was able to process tasks.

Ignoring the deadlocks errors would help. Additionally, it would be useful to reduce the frequency at which expired reservations are checked, in a similar fashion to PHP & Laravel's session garbage collector. At the very least, the behavior should be documented.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/laravel/framework/issues/7046#issuecomment-219819320

taylorotwell commented 8 years ago

Would definitely be interested in hacking on ideas with people if anyone is interested. A few thoughts come to mind. When using the DB driver definitely set the --sleep option to something reasonable so it's not just pinging your database constantly especially if you don't have a ton of jobs.

We need to rethink how jobs are migrated entirely most likely. I would be curious to just comment out that code entirely and see if people still have problems with the DB locking up so that we can isolate that as the actual problem. Can anyone try this?

I have pushed code to 5.2 branch (dev stability) to just swallow those deadlock exceptions. If we get an exception on the job migration query we'll just assume another queue listener is handling it and move along.

taylorotwell commented 8 years ago

Here is a rought draft of what I'm thinking about changing the migration of old jobs code to:

            $this->database->beginTransaction();

            $ready = $this->database->table($this->table)
                        ->lockForUpdate()
                        ->where('queue', $this->getQueue($queue))
                        ->where('reserved', 1)
                        ->where('reserved_at', '<=', $expired)
                        ->get();

            $ids = \Illuminate\Support\Collection::make($ready)->map(function ($row) {
                return $row->id;
            })->all();

            $this->database->table($this->table)
                ->whereIn('id', $ids)
                ->update([
                    'reserved' => 0,
                    'reserved_at' => null,
                    'attempts' => new Expression('attempts + 1'),
                ]);

            $this->database->commit();

Thoughts? I think this will avoid any deadlock problems with this query, and in addition I think we should have a small lottery so we don't need to run this each time the queue. Something like maybe a 10% chance of running it on any given iteration?

taylorotwell commented 8 years ago

So far I have processed 3,500 jobs with 8 workers running at the same time and have had no deadlock issues with this logic.

taylorotwell commented 8 years ago

I've pushed my logic to the 5.2 branch if anyone wants to use "dev" stability and try it out. Would really appreciate feedback! :)

20TRIES commented 8 years ago

Currently running just short of 400k jobs with 11 daemon workers using the database driver. 40k complete so far and only 3 exceptions thrown with the message:

SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction (SQL: delete from jobs where id = 30990)

20TRIES commented 8 years ago

Those all completed and only ever received 4 of the exceptions that i mentioned above.

taylorotwell commented 8 years ago

Nice. Do you have the full stack trace on those exceptions? How many workers were you running?

On Wed, May 18, 2016 at 4:37 AM -0700, "Marcus" notifications@github.com wrote:

These all completed and only ever received 4 of the exceptions that i mentioned above.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub

20TRIES commented 8 years ago

Failed Queue Job: App\Jobs\CreateOrder SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction (SQL: delete from jobs where id = 30990) vendor/laravel/framework/src/Illuminate/Support/Facades/Facade.php:224 Illuminate\Support\Facades\Facade::__callStatic app/Listeners/ReportFailedQueueJob.php:61 App\Listeners\ReportFailedQueueJob::handle [internal] call_user_func_array vendor/laravel/framework/src/Illuminate/Events/Dispatcher.php:347 Illuminate\Events\Dispatcher::Illuminate\Events{closure} [internal] call_user_func_array vendor/laravel/framework/src/Illuminate/Events/Dispatcher.php:221 Illuminate\Events\Dispatcher::fire app/Libraries/CustomLaravelComponents/Queue/Worker.php:100 App\Libraries\CustomLaravelComponents\Queue\Worker::raiseFailedJobEvent app/Libraries/CustomLaravelComponents/Queue/Worker.php:82 App\Libraries\CustomLaravelComponents\Queue\Worker::logFailedJob app/Libraries/CustomLaravelComponents/Queue/Worker.php:36 App\Libraries\CustomLaravelComponents\Queue\Worker::process vendor/laravel/framework/src/Illuminate/Queue/Worker.php:155 Illuminate\Queue\Worker::pop vendor/laravel/framework/src/Illuminate/Queue/Worker.php:111 Illuminate\Queue\Worker::runNextJobForDaemon vendor/laravel/framework/src/Illuminate/Queue/Worker.php:85 Illuminate\Queue\Worker::daemon vendor/laravel/framework/src/Illuminate/Queue/Console/WorkCommand.php:103 Illuminate\Queue\Console\WorkCommand::runWorker vendor/laravel/framework/src/Illuminate/Queue/Console/WorkCommand.php:71 Illuminate\Queue\Console\WorkCommand::fire [internal] call_user_func_array vendor/laravel/framework/src/Illuminate/Container/Container.php:507 Illuminate\Container\Container::call vendor/laravel/framework/src/Illuminate/Console/Command.php:169 Illuminate\Console\Command::execute vendor/symfony/console/Command/Command.php:256 Symfony\Component\Console\Command\Command::run vendor/laravel/framework/src/Illuminate/Console/Command.php:155 Illuminate\Console\Command::run vendor/symfony/console/Application.php:791 Symfony\Component\Console\Application::doRunCommand vendor/symfony/console/Application.php:186 Symfony\Component\Console\Application::doRun vendor/symfony/console/Application.php:117 Symfony\Component\Console\Application::run vendor/laravel/framework/src/Illuminate/Foundation/Console/Kernel.php:107 Illuminate\Foundation\Console\Kernel::handle artisan:35 [main]

20TRIES commented 8 years ago

Running 11 workers

taylorotwell commented 8 years ago

Interesting. So it was actually on deleting a failed job. Do you know why the jobs failed? Were you randomly causing jobs to fail just to test it?

20TRIES commented 8 years ago

Duplicate primary key entry when inserting; genuine error in the job.

PDOException·SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry '1525452' for key 'order_addresses_order_id_unique'

taylorotwell commented 8 years ago

OK Thanks.

On May 18, 2016, at 7:57 AM, Marcus notifications@github.com wrote:

Duplicate primary key entry when inserting; genuine error in the job.

PDOException·SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry '1525452' for key 'order_addresses_order_id_unique'

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/laravel/framework/issues/7046#issuecomment-220017597

20TRIES commented 8 years ago

No props, btw, could you take a brief look at the latest on #6368 when you get a moment; I'm not sure if you get notifications for closed issues but i think its worth a glance.

20TRIES commented 8 years ago

Oh btw, worth noting, the error in the job occurred 1276 times; so seems to be intermittent that it fails to delete the job from the queue. I've included a bugsnag screenshot that might illustrate; the remaining 1000's of errors didn't cause any locking problems.

marcus_dev_server_timeline_-_bugsnag

taylorotwell commented 8 years ago

OK just pushed another commit to 5.2. Can you pull in that and see if that solves the deadlocks on the deletes?

Thanks!

On May 18, 2016, at 8:14 AM, Marcus notifications@github.com wrote:

Oh btw, worth noting, the error in the job occurred many more then 4 times; so seems to be intermittent. I've included a bugsnag screenshot that might illustrate.

https://cloud.githubusercontent.com/assets/5886624/15359753/acabe4fc-1d02-11e6-846c-71c8734632ce.png — You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/laravel/framework/issues/7046#issuecomment-220021952

rabbitfang commented 8 years ago

Will some (or all) of the fixes be backported to 5.1?

taylorotwell commented 8 years ago

@20TRIES you have a chance to try my latest commit and see if it resolves the delete deadlock?

20TRIES commented 8 years ago

Hey, still getting the same thing.

Is the change below the change that you were trying?

    public function deleteReserved($queue, $id)
    {
        $this->database->beginTransaction();
        if ($this->database->table($this->table)->lockForUpdate()->find($id)) {
            $this->database->table($this->table)->where('id', $id)->delete();
        }
        $this->database->commit();
    }

Stack trace:

failed_queue_job__app_jobs_createorder_in_marcus_dev_server_-_bugsnag

20TRIES commented 8 years ago

Could the job record be locked from the last time it was updated?

Would the code below lock more then the one record?

protected function getNextAvailableJob($queue)
    {
        $this->database->beginTransaction();

        $job = $this->database->table($this->table)
                    ->lockForUpdate()
                    ->where('queue', $this->getQueue($queue))
                    ->where('reserved', 0)
                    ->where('available_at', '<=', $this->getTime())
                    ->orderBy('id', 'asc')
                    ->first();

        return $job ? (object) $job : null;
    }

I think MySql, in some situations, locks all records that are scanned; i had a similar issue when deleting records.

For example, if i deleted WHERE IN (1,5) it would lock 2,3 and 4 as well.

To get around this i had to specifically delete where id is 1 and then where id is 5 in two separate statements.

Not sure if the statement above is similar and locks more then expected; meaning that other records are locked, thus causing issues when other workers attempt to lock / perform updates on them.

http://stackoverflow.com/a/2051370/6140759

http://dev.mysql.com/doc/refman/5.7/en/innodb-locks-set.html

If this is the case then you would likely have to get the id and then specifically lock that record, catching any exceptions when performing the lock. If the record is already locked when you perform the lock statement, move onto the next record...?

taylorotwell commented 8 years ago

Not sure. Feel free to play with it a bit and see if you can figure something out?

20TRIES commented 8 years ago

I was thinking something like this (i still need to check the exception code):

    protected function getNextAvailableJob($queue)
    {
        $this->database->beginTransaction();

        $job = $this->database
            ->table($this->table)
            ->where('queue', $this->getQueue($queue))
            ->where('reserved', 0)
            ->where('available_at', '<=', $this->getTime())
            ->orderBy('id', 'RAND')
            ->first();

        if(!is_null($job)) {
            try {
                $job = $this->database
                    ->table($this->table)
                    ->lockForUpdate()
                    ->where('id', $job->id)
                    ->get();
            } catch (\PDOException $ex) {
                if($ex->getCode() != '1213') {
                    throw $ex;
                }
                return null;
            }
        }
        return $job;
    }

Do the jobs have to be processed in order of id? Fetching them randomly would reduce the likelihood of workers trying to get the same one...

Will give it a test tomorrow.

taylorotwell commented 8 years ago

Sounds like form my reading around online it’s really not going to be feasible to avoid deadlocks in some way when using a relational database as a queue. It’s just not really intended for that purpose. If you’re only running a couple workers it’s probably fine but too many workers is going to start locking up.

On May 19, 2016, at 11:43 AM, Marcus notifications@github.com wrote:

I was thinking something like this (i still need to check the exception code):

protected function getNextAvailableJob($queue)
{
    $this->database->beginTransaction();

    $job = $this->database
        ->table($this->table)
        ->where('queue', $this->getQueue($queue))
        ->where('reserved', 0)
        ->where('available_at', '<=', $this->getTime())
        ->orderBy('id', 'RAND')
        ->first();

    if(!is_null($job)) {
        try {
            $job = $this->database
                ->table($this->table)
                ->lockForUpdate()
                ->where('id', $job->id)
                ->get();
        } catch (\PDOException $ex) {
            if($ex->getCode() != '1213') {
                throw $ex;
            }
            return null;
        }
    }
    return $job;
}

Do the jobs have to be processed in order of id? Fetching them randomly would reduce the likelihood of workers trying to get the same one...

Will give it a test tomorrow.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/laravel/framework/issues/7046#issuecomment-220382963

20TRIES commented 8 years ago

Hmm possibly, i have a couple more things i want to try first before accepting that though :P

My theory above turned out to be wrong; your previous query to lock the next available job for update works fine and doesn't lock multiple records.

My next theory is:

When you lock the next available job, this doesn't stop another worker from seeing the job as available; it merely makes them wait to lock it themselves. Its not until the second query (an update) is made that the job will been seen as reserved. However slight, there should be time between these two queries for a second worker to attempt to lock the job and thus be left waiting for 50secs until the lock timeout error is thrown. The fact that this window is very short would also explain why i only get a couple of the exceptions when running lots of jobs.

So i propose the following:

protected function getNextAvailableJob($queue)
    {
        $job = $this->database
            ->table($this->table)
            ->where('queue', $this->getQueue($queue))
            ->where('reserved', 0)
            ->where('available_at', '<=', $this->getTime())
            ->orderBy('id', 'asc')
            ->first();

        return is_null($job) || $this->markJobAsReserved($job->id) ? $job : $this->getNextAvailableJob($queue);
    }

 protected function markJobAsReserved($id)
    {
        $updates = $this->database
            ->table($this->table)
            ->where('id', $id)
            ->where('reserved', 0)
            ->where('available_at', '<=', $this->getTime())
            ->update([
                'reserved' => 1,
                'reserved_at' => $this->getTime(),
            ]);

        return $updates > 0;
    }

The above functions do not lock records, but instead gracefully handle the fact that another worker beat them to reserving the record by going and getting the next available job and starting again.

This could however cause race conditions where a worker is constantly blocked by other workers that are beating them to the punch, so to speak. In this case, i guess the best way forward would be to then randomise the order that jobs are fetched from the queue; this would drastically reduce the likelihood of a worker being blocked many times.

I can't test this out until Monday though as I'm on 4G and don't have the necessary data allowance :P

Will get back to you Monday with an update; let me know what you think :)

20TRIES commented 8 years ago

Seems to be running well with those changes and a couple of other small bits. Will make a pull request tomorrow; changes are a bit big for pasting into a comment.

GrahamCampbell commented 8 years ago

Thank you everyone! :heart:

mpskovvang commented 7 years ago

Great changes!

But what are the thoughts for using both _reservedat and _availableat columns to find next available job? Also the reserved column seems to not be used at all?

This is the current SQL:

select * from``jobs``where``queue``= ? and ((``reserved_at``is null and``available_at``<= ?) or (``reserved_at``<= ?)) order by``id``asc limit 1 for update

If we modify the usage of the _availableat column a little bit we could end up with:

select * from``jobs``where``queue``= ? and``available_at``<= ? order by``id``asc limit 1 for update

The idea is to increase the _availableat timestamp every time the row is reserved. It also gives the posibility to implement a touch function which increases the timestamp while reserved, giving the script more time to process the job.

If the index is changed to ['queue', 'available_at'] the MySQL explain execution of the SQL gives "Using index condition; Using filesort" and a 100% filtered instead of "Using index condition; Using where; Using filesort" with less filtered.

The reserved column can be removed entirely.

I haven't done any further tests or benchmark. I just wanted to share my thoughts. :)

Edit: I'm not sure how GitHub notifications works for closed issues. Sorry for pinging you directly if you have already seen this comment. @GrahamCampbell @taylorotwell

ipa1981 commented 7 years ago

As @rabbitfang wondered could the solution (if works) be backported in 5.1?

ux-engineer commented 7 years ago

I have 7 named queuees with work daemons (1 worker per queuee) running on database driver with supervisor. Laravel version 5.2.45.

Params: --delay=1 --sleep=3 --tries=3 (Btw does that delay param even apply at all, as delay between jobs...?)

Just crunched about 2500 jobs quickly and have got about 110 deadlock errors. This particular bunch of jobs was so that about 2400 jobs was on a single queue and about 100 was only an another queue, both running fast on parallel.

[2016-12-04 01:06:28] production.ERROR: PDOException: SQLSTATE[HY000]: General error: 1205 Lock wait timeout exceeded; try restarting transaction in /home/project/vendor/laravel/framework/src/Illuminate/Database/Connection.php:335
Stack trace:
#0 /home/project/vendor/laravel/framework/src/Illuminate/Database/Connection.php(335): PDOStatement->execute(Array)
#1 /home/project/vendor/laravel/framework/src/Illuminate/Database/Connection.php(722): Illuminate\Database\Connection->Illuminate\Database\{closure}(Object(Illuminate\Database\MySqlConnection), 'select * from `...', Array)
#2 /home/project/vendor/laravel/framework/src/Illuminate/Database/Connection.php(685): Illuminate\Database\Connection->runQueryCallback('select * from `...', Array, Object(Closure))
#3 /home/project/vendor/laravel/framework/src/Illuminate/Database/Connection.php(349): Illuminate\Database\Connection->run('select * from `...', Array, Object(Closure))
#4 /home/project/vendor/laravel/framework/src/Illuminate/Database/Query/Builder.php(1610): Illuminate\Database\Connection->select('select * from `...', Array, false)
#5 /home/project/vendor/laravel/framework/src/Illuminate/Database/Query/Builder.php(1596): Illuminate\Database\Query\Builder->runSelect()
#6 /home/project/vendor/laravel/framework/src/Illuminate/Database/Query/Builder.php(1577): Illuminate\Database\Query\Builder->get(Array)
#7 /home/project/vendor/laravel/framework/src/Illuminate/Queue/DatabaseQueue.php(193): Illuminate\Database\Query\Builder->first()
#8 /home/project/vendor/laravel/framework/src/Illuminate/Queue/DatabaseQueue.php(164): Illuminate\Queue\DatabaseQueue->getNextAvailableJob('queuename3')
#9 /home/project/vendor/laravel/framework/src/Illuminate/Queue/Worker.php(184): Illuminate\Queue\DatabaseQueue->pop('queuename3')
#10 /home/project/vendor/laravel/framework/src/Illuminate/Queue/Worker.php(149): Illuminate\Queue\Worker->getNextJob(Object(Illuminate\Queue\DatabaseQueue), 'queuename3')
#11 /home/project/vendor/laravel/framework/src/Illuminate/Queue/Worker.php(111): Illuminate\Queue\Worker->pop('database', 'queuename3', '5', '3', '3')
#12 /home/project/vendor/laravel/framework/src/Illuminate/Queue/Worker.php(85): Illuminate\Queue\Worker->runNextJobForDaemon('database', 'queuename3', '5', '3', '3')
#13 /home/project/vendor/laravel/framework/src/Illuminate/Queue/Console/WorkCommand.php(119): Illuminate\Queue\Worker->daemon('database', 'queuename3', '5', 128, '3', '3')
#14 /home/project/vendor/laravel/framework/src/Illuminate/Queue/Console/WorkCommand.php(78): Illuminate\Queue\Console\WorkCommand->runWorker('database', 'queuename3', '5', 128, true)
#15 [internal function]: Illuminate\Queue\Console\WorkCommand->fire()
#16 /home/project/bootstrap/cache/compiled.php(1257): call_user_func_array(Array, Array)
#17 /home/project/vendor/laravel/framework/src/Illuminate/Console/Command.php(169): Illuminate\Container\Container->call(Array)
#18 /home/project/vendor/symfony/console/Command/Command.php(256): Illuminate\Console\Command->execute(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#19 /home/project/vendor/laravel/framework/src/Illuminate/Console/Command.php(155): Symfony\Component\Console\Command\Command->run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#20 /home/project/vendor/symfony/console/Application.php(794): Illuminate\Console\Command->run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#21 /home/project/vendor/symfony/console/Application.php(186): Symfony\Component\Console\Application->doRunCommand(Object(Illuminate\Queue\Console\WorkCommand), Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#22 /home/project/vendor/symfony/console/Application.php(117): Symfony\Component\Console\Application->doRun(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#23 /home/project/vendor/laravel/framework/src/Illuminate/Foundation/Console/Kernel.php(107): Symfony\Component\Console\Application->run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#24 /home/project/artisan(35): Illuminate\Foundation\Console\Kernel->handle(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#25 {main}
20TRIES commented 7 years ago

Don't think this was ever fully resolved.

Need to combine the query to get the next job and the query to reserve that job, however, solution seems to be platform specific.

See #14231. This makes the workers "block" to avoid the exceptions, but there would be some wasted process time every time a worker blocks and these blocks would still increase with more then 10-15 workers per queue.

I switched to SQS queue driver in order to avoid this.

If nothing else it might be worth checking whether you get any lock issues when running the jobs on a different driver like sqs; might be that your jobs contain queries which cause some of the locks? Might not be the case but definitely worth checking as the number of lock errors is quite high and if it is the case you might find you can switch back to the database driver once the jobs have been optimised.

lezhnev74 commented 7 years ago

I just faced it with Laravel 5.3.26 having 8 workers and database driver. We are switching to Redis to remove even a possibility of such thing on our production server.

I am keeping an eye on this thread to know if this problem was\will be resolved for database driver.

williamjulianvicary commented 7 years ago

I've ran into this as well, testing on a local machine with HTTP requests being handled by the queues and this is locking the database up completely. The threads get killed by supervisor.d but they just restart and get locked again, it seems that the UPDATE's aren't locking up the database though, I can't put my finger on what specifically is locking it up but something is causing the UPDATE requests to hang and then get killed by supervisor and start again, same issue every time.

Edit: Please please please make the documentation reflect this oddity, it's obviously quite prevalent and it's very frustrating running into this when testing...

taylorotwell commented 7 years ago

The database driver should not be used in a heavy production type environment. It's not the right tool for the job. Heavily suggest any of the other drivers. The database driver is primarily useful in local development environments or in small production environments where you just have 1-2 workers.

williamjulianvicary commented 7 years ago

Whilst I appreciate that now, I think myself and many of the other commenters here weren't aware that this would be such an issue, especially not at low volume (3 workers). It would be great if the documentation should note that the DB driver isn't good for queuing at any kind of volume.

I've now switched to Redis, which isn't as nice for testing it'll be better than sticking with the MySQL workers and falling into these kind of issues.

iamgoodbytes commented 7 years ago

Still have this in Laravel 5.4 with 8 workers and database as the driver. Moving away to redis to avoid this. It would definitely help if the documentation has a mention of this.

poma commented 7 years ago

Didn't know that a micro instance on AWS with 4 workers is considered a high volume. Docs should mention that DB is designed for only 1 worker.