laravel / framework

The Laravel Framework.
https://laravel.com
MIT License
32.16k stars 10.88k forks source link

Multiple crons problem #22129

Closed webmake closed 6 years ago

webmake commented 6 years ago

Description:

After upgrading from 5.5.19 into 5.5.20 started exceptions related with crons. (very rare but happens, and I don't like that)

SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction (SQL: delete from `multi_server_event` where `mutex` = 74d07339684141a8f7edbeb61c835586 and `complete` < 2017-11-17 23:27:51 and `next` < 2017-11-17 23:29:00)

Together I use https://github.com/jdavidbakr/MultiServerEvent

CACHE_DRIVER=redis

I suspect that doesn't work https://github.com/laravel/framework/blob/5.5/src/Illuminate/Console/Scheduling/Event.php#L170 after changes in 5.5.20. There was 9events in 2days so far (6 yesterday, 3 till now). Cron works every minute, I have 3crons: 1: every minute 2: every minute 3: every day at 6h o'clock Duration for them is not so long, I guess up to 5seconds. Also I have 2servers, so this error means, that same cron is executed when should not been executed.

Can it be Redis related somehow? Or is it changes in 5.5.20 broken functionality at very edge cases?

Dylan-DPC-zz commented 6 years ago

Hey

Can you provide us with more details? Some steps to reproduce?

webmake commented 6 years ago

Like I just said, it's rare problem 0.2% of events, can't understand what causes this problem. If during the day 6 failures out of 2881 (24h60min 2crons every minute +1daily cron), in compare all other previous versions then never happened, since 5.4 laravel. I have noticed that only one interesting thing not mentioned in change log, can it be the cause? image

Kyslik commented 6 years ago

Perhaps make an example repo that we can play with.

webmake commented 6 years ago

Could you please firstly explain how crons are protected from overlapping mechanizm works?

Because soon I will provide POC, that was not solved even in 5.4 laravel.

webmake commented 6 years ago

Okay, experiment proved for me that bug not related with redis mechanism at all, because has does not affected create as I thought.. I guess I have very strange issue, when inserted record in database won't appear for other php process. (I had such issue and I know) so my used library fails on that case.

Anyway, POC, that you both required:

<?php

namespace App\Console;

use Illuminate\Console\Command;
use Illuminate\Console\Scheduling\Schedule;
use Illuminate\Foundation\Console\Kernel as ConsoleKernel;

class Kernel extends ConsoleKernel
{
    protected $commands = [
        Command1::class,
        Command2::class,
        Command3::class,
    ];

    protected function schedule(Schedule $schedule): void
    {
        $schedule->command(Command1::class)
            ->everyMinute()
            ->withoutOverlapping();
        $schedule->command(Command2::class)
            ->everyMinute()
            ->withoutOverlapping();
        $schedule->command(Command3::class)
            ->everyMinute()
            ->withoutOverlapping();
    }

    protected function commands(): void
    {
        require base_path('routes/console.php');
    }
}

class Command1 extends Command
{
    protected $signature = 'command:first';

    public function handle()
    {
        sleep(3);
        logger($this->signature);
    }
}

class Command2 extends Command
{
    protected $signature = 'command:second';

    public function handle()
    {
        sleep(20);
        logger($this->signature);
    }
}

class Command3 extends Command
{
    protected $signature = 'command:third';

    public function handle()
    {
        sleep(4);
        logger($this->signature);
    }
}

crontab (which simulates 2servers)

* * * * * php /project/artisan schedule:run
* * * * * php /project/artisan schedule:run

result:

[2017-11-18 12:01:42] local.DEBUG: command:first
[2017-11-18 12:01:59] local.DEBUG: command:third
[2017-11-18 12:01:59] local.DEBUG: command:second
[2017-11-18 12:02:22] local.DEBUG: command:third
[2017-11-18 12:02:37] local.DEBUG: command:first
[2017-11-18 12:02:51] local.DEBUG: command:third
[2017-11-18 12:02:55] local.DEBUG: command:second
[2017-11-18 12:03:09] local.DEBUG: command:third
[2017-11-18 12:03:34] local.DEBUG: command:first
[2017-11-18 12:03:51] local.DEBUG: command:third

This is bug, because cron command itself doesn't know that it was executed already or not. Proven by two people with different approaches, but eventually got same results, that third command runs twice in the same minute.

sisve commented 6 years ago

Also I have 2 servers, so this error means, that same cron is executed when should not been executed.

There is a protection against overlapping scheduled jobs in Laravel, which you have to opt-in to, and is based on a shared cache iirc. That protection targets only overlapping jobs, not jobs that execute quickly and several times in one minute at different seconds.

webmake commented 6 years ago

Basically cron means, I want task to be executed once per period. This does not happen. There was a try, but not fully solved.

crynobone commented 6 years ago

Without overlapping works when you have the instance under autoscaling setup, which would cause every instance to run the scheduled task every minutes (which is how the cron supposed to setup).

This setup has been working great and an expected behaviour for us.

On 19 Nov 2017 3:29 am, "webmake" notifications@github.com wrote:

Basically cron means, I want task to be executed once per period. This does not happen. There was a try, but not fully solved.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/laravel/framework/issues/22129#issuecomment-345465264, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKjpiOqCoY0MwSj3P2b6gAr_j1KVW5Jks5s3zApgaJpZM4Qi_l- .

webmake commented 6 years ago

Did you tried an example? It proofs that no matter how much instances you have (I only simulate it on vm with duplicated scheduler, for clearity, as it would be two instances of aws servers or smth like similar) , cron third on same minute is executed on both servers, despite the fact that it was already executed.

That is visual explanation https://user-images.githubusercontent.com/5016464/27125161-0d185162-50fc-11e7-81c0-76bb30645637.png

crynobone commented 6 years ago

I wouldn't reply here if I didn't have this setup on my own production code. We will be heavily affected if multiple instance running the same scheduled at the same triggered time.

Our example already proved that the cron is executed on each minutes based on the result. Which is expected since you declared it everyMinutes()->without overlapping()

crynobone commented 6 years ago

As far as I understand it should prevent you from calling cron from multiple instance at the same time. Whether if Laravel intended to support running process cron or not will determine whether this is a bug or something Laravel never intends to support in the first place.

In our case we never want our cron to run at a very long time, it should create small job for each of it task so it can utilize queue. It much easier to fallback when a job has failed compared to a cron failing.

sisve commented 6 years ago

You're both right. The overlapping stuff works if all your machines execute the jobs at the same time. However, it will not protect you against crontab jittering, slightly out-of-sync clocks and tears in the spacetime continuum.

Also, as the image shows, it will not help if there is a large scheduled task to execute. One of them gets to do it, and will then afterwards start executing the other jobs since the first server has already completed them.

This issue has been present for a long time. I wouldn't call it a bug, the implementation is not meant to be distributed onto several machines. A solution would be to have a dedicated machine that runs the scheduler and dispatches jobs handled by your queue workers.

webmake commented 6 years ago

Well, if there are still people who don't understand how wild this stuff can become image Everything depends on what sequence and duration are of them, and how much servers you have. Having 4crons and 4servers, in perfect world we could manage to run only 4times (if we could control duration of them to be very short). But in samples above, you can see instead of expected executed 4crons, can be 9times or even worse 10 times. (1+2+3+4), that means, linear summation. Don't you think that is bug?

webmake commented 6 years ago

And yes, @sisve , we were advised to dedicate 1 server only for crons, which means additional separated deployment, and so rised costs. Or either to use smart mechanism of queue managing (that crons won't be executed more than expected). But after some experiments we found out that with some tweaks (knowing when cron must be executed next time) multiserver package could solve our problem.

Until now... Why, because for no reason we having deadlocks, I don't see any piece of ours code where starting transactions or dealing with them. So I think it is laravel bug how it deals with crons (despite parallelism problem mentioned above) that forbid from executing two at the same time

sisve commented 6 years ago

Don't you think that is bug?

I do not think this is a bug. The scheduler has never claimed to support the feature you're now describing. It would be an improvement if it worked in these scenarios, but it is currently not a bug in my eyes.

we were advised to dedicate 1 server only for crons, which means additional separated deployment, and so rised costs.

I'm not sure what that additional deployment would be. The only cost I see is the increase is for another machine, and both Digital Ocean and AWS has them for 5 USD per month. The actual application deployment would be identical on all your machines, only the configuration would be different. Only one would call php artisan schedule:run. There is an additional cost, but I do consider it nearly as high as you make it sound.

The multiserver package you link does almost exactly what Laravel does; except it has a few seconds delay after every job where it is still blocking. Even the description of the package states that it is focusing on overlapping jobs; "plugin to allow scheduled events across multiple servers with the same scheduler to not overlap". The distributed overlapping functionality that the linked package provides has been implemented in Laravel 5.4, except Laravel is using a shared cache instead of a database table.

The problem reported in this issue, regarding jdavidbakr/multi-server-event, is about database deadlocks. Report this to the package writer; it's their queries that are failing. Simultaneous access and state management in databases is hard. It's error-prone, it's database-dependent, and even a single index can bring havoc to the locking strategies. Consider moving off that package and use the built-in withoutOverlapping support provided by Laravel.

Neither Laravel, nor the package, provides support for the scenarios described in this issue. The package happens to have a 10 second delay between when a task ends and the mutex is removed, and that is in turn creating additional issues for that package.

In order to prevent a condition where a short-running command's lock doesn't last long enough, we are implementing a minimum 10 second break between the completion of the command and its next execution time, so if a command runs every minute but takes between 50 and 59 seconds to complete, the next command will be delayed one more minute.

Source: https://github.com/jdavidbakr/MultiServerEvent

webmake commented 6 years ago

Btw I forgot to mention, additional downside of dedicated single server for crons - would lead us to single point of failure.

FYI, currently package solves that problem completely, not even via 10s gap. (Sure I will consider to report as well about deadlocks for package owned, if will find out exact the locks problem).

So, if that's not considered to be a bug, then can such feature be implemented, to prevent from executing more than once in laravel itself?

Kyslik commented 6 years ago

I would move this to internals and just reference this issue from there. So no notifications for 900+ people are sent out.

laurencei commented 6 years ago

As @Kyslik said - this is not a "bug". It is a feature limitation that has been present literally forever since Laravel 5.0.0

The original post describes a fault, but it is due to the use of a 3rd party library - so it's not directly related to the core.

Move to Internals to discuss. Or someone can do a PR for Taylor that provides true multi-server scheduling support. I've actually got a pending PR relating to re-writing part of the scheduler (https://github.com/laravel/framework/pull/20403), I think I can add it there with some tweaks - I'll check it out.

laurencei commented 6 years ago

I've done a PR for this feature. Turns out it seems really easy to add this feature: https://github.com/laravel/framework/pull/22137

Lets move any further discussion there...

themsaid commented 6 years ago

Closing this issue and moving the discussion to the PR: https://github.com/laravel/framework/pull/22137