laravel / framework

The Laravel Framework.
https://laravel.com
MIT License
32.18k stars 10.89k forks source link

onOneServer not respected using elastic beanstalk and redis #29230

Closed McKean closed 4 years ago

McKean commented 5 years ago

Description:

When using onOneServer for cron tasks, the task is executed on all instances when deployed on aws beanstalk. Works fine on localhost.

relevant versions:

    "laravel/framework": "5.8.*",
    "predis/predis": "~1.0",

redis version on localhost, using managed redis on AWS (version 3)

Redis server v=3.2.12 sha=00000000:0 malloc=jemalloc-4.0.3 bits=64 build=9537aa82cd76e6f2

Our setup on AWS: 4 beanstalk environments (web, worker1, worker2, preprod-web) instance count is normally as follows: web: 2 (can scale to 8) worker1: 1 (can scale to 8) worker2: 1 (can scale to 8) preprod-web: 1

cron scheduled jobs are only triggered on the worker environments but the tasks are processed twice from what we observe. Based on beanstalk conf it will only trigger cron on one instance in the environment.

we use the onOneServer command and have redis as a cache server to keep track of the command locking.

Yet we still see the issue.

If we use withoutOverlapping it works as expected. But this isn't really clean.

Any ideas? Happy for any inputs.

Redis db conf:

    'redis' => [

        'client' => 'predis',

        'default' => [
            'host'     => env('REDIS_HOST', 'redis'),
            'password' => env('REDIS_PASSWORD', null),
            'port'     => env('REDIS_PORT', 6379),
            'database' => 0,
        ],

        'cache' => [
            'host'     => env('REDIS_HOST', 'redis'),
            'password' => env('REDIS_PASSWORD', null),
            'port'     => env('REDIS_PORT', 6379),
            'database' => 1,
        ],

    ],

Redis cache store:

...
    'default' => env('CACHE_DRIVER', 'redis'),
...
    'stores' => [
...
        'redis' => [
            'driver' => 'redis',
            'connection' => 'cache',
        ],
    ],
...

What is interesting is that the cron trigger from elastic beanstalk will only execute on one instance within an environment, but since we have two environments they both execute (which is fine) but the joob should't be triggered twice.

Sample job:

        $schedule->command('score:calculate')
            ->dailyAt('0:00')
            ->onOneServer()
            ->runInBackground();

What we use as a workaround for now is:

            ->withoutOverlapping(5)

Should this be an acceptable solution? I still think it's worth reporting, so I'm open to provide additional information and try get this working.

Steps To Reproduce:

Running a scheduled task with onOneServer on elastic Beanstalk (I know very specific) with more than one worker environment.

Thanks!

driesvints commented 5 years ago

Can you please fill out the issue template?

Shkeats commented 5 years ago

A couple of things to consider: 1.) Are all the environments definitely connected to the same redis instance and cache db number? 2.) Are the clocks synchronized across all of the environments? It might be that withoutOverlapping(5) is locking for long enough (5 minutes) to cover clock drift.

McKean commented 5 years ago

@driesvints adjusted to fit template

@Shkeats

  1. yeah, proof of that is that when using withoutOverlapping there are no duplicate executions.
  2. based on what onOneServer does meaning using the hash of job name + dispatch time (cron) as a cache key this should not matter... the first server will add the key to redis, the second won't be able to add the same cache key. Additionally we use elastic beanstalk and don't deal with the instance itself so the clocks should be as synchronous as possible.

I have a feeling this is not necessarily an issue of laravel, but I'm hoping someone else has run into this just to point me into the right direction.

driesvints commented 5 years ago

Have you tried two separate redis connections for the cache and queue? I know you've set separate dbs but just to rule out that possibility.

Shkeats commented 5 years ago

@McKean Appreciate it seems unlikely that it's clock drift if AWS control that for you.

However it does look to me like the mutex it puts in the cache inside runSingleServerEvent() is actually the current time as defined in $this->startedAt on the schedule:run command, but accurate to one minute. So implicitly it allows 1 minute of clock drift but not more. If the servers/environments/whatever were more than a minute out of sync then I think the command would run twice on the scheduler as the cache keys wouldn't match.

I've certainly had misconfigured ec2 instances go much further out of sync than that. You could log the current time and environment inside your Kernel schedule method just to rule this out.

Update: it seems that ElasticBeanstalk can have clock drift if the firewall isn't configured to allow NTP over port 123 (UDP)

https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/vpc.html https://stackoverflow.com/questions/48582929/beanstalk-instance-ntp https://forums.aws.amazon.com/thread.jspa?messageID=819634

driesvints commented 5 years ago

Honestly, I'd just recommend to always use a separate redis connection for your cache and queue. Please see a note about this as well here: https://laravel.com/docs/5.8/cache#removing-items-from-the-cache

Otherwise you run into issues like this: https://github.com/laravel/framework/issues/29349

Can you please test with separate connections to see if that fixes the problem?

McKean commented 5 years ago

@Shkeats to get the date issue out of our way I did the following:

env A

[ec2-user@ip-xxx ~]$ date
Fri Aug  2 15:32:06 UTC 2019

env B

[ec2-user@ip-yyy ~]$ date
Fri Aug  2 15:32:07 UTC 2019

The one second difference is due to me switching terminal windows. Perhaps I should do the same thing within the container just to be absolutely sure...

edit: I checked the container and the values are the same

@driesvints we'll give this a try and report back, thanks for the idea.

driesvints commented 4 years ago

Closing this issue because it's inactive, already solved, old or not relevant anymore. Feel free to reply if you're still experiencing this issue and we'll re-open this issue.

driade commented 1 year ago

So implicitly it allows 1 minute of clock drift but not more. If the servers/environments/whatever were more than a minute out of sync then I think the command would run twice on the scheduler as the cache keys wouldn't match.

In my opinion this should be improved not limiting the lock to one minute but to the number of minutes expected for the next cron run. With it's caveats.

I think it's not only a matter of a minute, in the code we see $time->format('Hi'). This could happen with just one second difference too.

Sorry @driesvints any chance this could be reopened?

McKean commented 1 year ago

This has been a while and I never reported back, sorry about this. @mfurgal do you have any ideas if there are still issues regarding this?